Configure Validators to track a specified metric, calculated from one or more fields. Then, define a Threshold to catch incidents in the data validation.
Map of the territory
This page explains the concept of Validators in Validio and provides useful context. Details on specific Validators are listed under Validator types.
What is a Validator?
A Validator is configured in Validio to monitor data from your source. A Validator is always attached to a Source and calculates the desired metric over the specified Window.
You can use filters to exclude specific datapoints from the Validator calculation.
What does a Validator monitor?
A Validator monitors a metric calculated from one or more fields over a Window. For example:
- Mean of
- Cardinality of
- Standard deviation of
A Validator can also track reference metrics (statistics), that is metrics produced by comparing fields from two different datasets. For example:
- Ratio between two means: mean of
table1_ageand mean of
- Relative entropy between
table2_basket_size, used to check distribution shifts. For more information, refer to Numeric distribution or Categorical distribution Validator types.
- Number of new categories in
Validator tracks a scalar value or a string value. For example, mode of a categorical fields, which is calculated from one or several fields as in the preceding examples.
Metric in this context
Metric in this context should not be confused with the mathematical definition of Metric.
How do I configure a Validator?
You can configure a Validator in two ways:
- Configure Validators according to your specifications.
- Let Validio set up recommended Validators for you.
Validio can set up recommended Validators to help you get started with your data monitoring.
Thresholds are used to define what values of the calculated Validator metric should be considered an incident.
You can either set up manual or smart (automatic) Thresholds, to identify your incidents. All metric values that breach your Threshold are flagged as incidents, which can be collected and sent as notifications.
Filters allow you to determine which raw data is validated.
By using filters, you can exclude certain datapoints in the metrics calculation for a Validator. For Validators that offer egress, you can define which datapoints are written to your Destination.
Example of using filters
The following example illustrates how you can apply a filter in your Validator:
|price||price between 100-1000?|
This is a conceptual model of what happens in the ‘filtering’ stage. Note: No actual column is created in your data, this model is for explanation purposes only.
Only datapoints passing the filter, which in this case is greater than
100, are included in the metric calculation. Whether it is a row count Validator or a mean Validator, datapoints not passing the filter logic are not included in metric calculation.
Backfill is used to load and view historical data in your validations. It is also used to train algorithms on historical data, so that validators can provide value from day one.
Typically, backfill of validators with historical data occurs when you start a source for the first time, if historical data is available. On a started Source, you can select the backfill option when configuring new Validators if you want to load historical data.
Pending backfill on an already started Source
If you select the backfill option when you create a Validator, the Validator is put into the Pending Backfill state. You must trigger a manual backfill on the source to get the data in the Validator.
Updated 3 days ago