HomeDemoContact

Validator types

Overview of supported Validator types used for calculating metrics.

📘

Validators calculate metrics over a window

For example, a Validator calculates the mean value over a daily window, and then validates if these daily mean values follow an expected seasonal pattern.

List of Validator types and supported metrics

Single Source Validators

Single Source Validators calculate metrics based on one dataset.

Validator type Metric Description
Numeric
Mean
Maximum
Minimum
Standard deviation
Validating the mean of a numerical field.
Validating the maximum of a numerical field.
Validating the minimum of a numerical field.
Validating the standard deviation of a numerical field.
Relative time
Minimum difference
Maximum difference
Mean difference
Validating the minimum difference between two date fields.
Validating the maximum difference between two date fields.
Validating the mean difference between two date fields.
Volume
Count
Percentage
Duplicate count
Duplicate percentage
Unique count
Unique percentage
Validating number of total rows.
Validating percentage of rows passing certain filter criteria.
Validating the number of duplicates.
Validating the percentage of duplicates.
Validating the number of unique rows.
Validating the percentage of unique rows.
Freshness
Freshness
Validate the time elapsed since the data was last updated.

Reference Validators

Reference Validators calculate metrics based on multiple fields from two different datasets. The two datasets can either be derived from completely different Sources, or from different windows within the same Source.

These Validators only calculate metrics if there is data in the target dataset. For example, a Categories removed Validator where the reference dataset has 4 categories and the target dataset has 3 categories, yields a result of 1. Conversely, if the target dataset has 0 categories, the Validator does not yield any result. Since this means that the target dataset has no data to calculate metrics on.

Validator type Metric Description
Numeric anomaly
Count
Percentage
Number of datapoints identified as anomaly.
Percentage of datapoints identified as anomaly.
Numeric distribution
Relative entropy
Mean ratio
Maximum ratio
Minimum ratio
Standard deviation ratio
Validate differences in the distribution between two datasets.
Validate the ratio of the mean between two datasets.
Validate the ratio of the maximum value between two datasets.
Validate the ratio of the minimum value between two datasets.
Validate the ratio of the standard deviation between two datasets.
Relative volume
Count ratio
Percentage ratio
Validating the ratio of the number of rows in the target dataset and the number of rows in the reference dataset.
Validating the number of rows in the target dataset as a percentage of the number of rows in the reference dataset.
Categorical distribution
Categories added
Categories removed
Categories changed
Relative entropy
Validating the number of new categories in target dataset against a reference dataset.
Validating the number of removed categories in the target dataset against a reference dataset.
Validating the number of changed categories in the target dataset against a reference dataset.
Validate differences in the distribution of a categorical field between two datasets.