HomeDemoContact

Overview

Configure Validators to track a specified metric, calculated from one or more fields. Then, define a Threshold to catch incidents in the data validation.

πŸ“˜

Map of the territory

This page explains the concept of Validators in Validio and provides useful context. Details on specific Validators are listed under Validator types.

What is a Validator?

A Validator is configured in Validio to monitor data from your source. A Validator is always attached to a Source and calculates the desired metric over the specified Window.

You can use filters to exclude specific datapoints from the Validator calculation.

An example of Validator details.

An example of a Freshness Validator.

What does a Validator monitor?

A Validator monitors a metric calculated from one or more fields over a Window. For example:

  • Mean of Age
  • Cardinality of Country
  • Standard deviation of Price

A Validator can also track reference metrics (statistics), that is metrics produced by comparing fields from two different datasets. For example:

  • Ratio between two means: mean of table1_age and mean of table2_age
  • Relative entropy between table1_basket_size and table2_basket_size, used to check distribution shifts. For more information, refer to Numeric distribution or Categorical distribution Validator types.
  • Number of new categories in table1_country compared to table2_country

Validator tracks a scalar value or a string value. For example, mode of a categorical fields, which is calculated from one or several fields as in the preceding examples.

🚧

Metric in this context

Metric in this context should not be confused with the mathematical definition of Metric.

How do I configure a Validator?

You can configure a Validator in two ways:

  1. Configure Validators according to your specifications.
  2. Let Validio set up recommended Validators for you.

Recommended Validators

Validio can set up recommended Validators to help you get started with your data monitoring.

Thresholds

Thresholds are used to define what values of the calculated Validator metric should be considered an incident.

You can either set up manual or smart (automatic) Thresholds, to identify your incidents. All metric values that breach your Threshold are flagged as incidents, which can be collected and sent as notifications.

Filters

Filters allow you to determine which raw data is validated.

By using filters, you can exclude certain datapoints in the metrics calculation for a Validator. For Validators that offer egress, you can define which datapoints are written to your Destination.

Example of using filters

The following example illustrates how you can apply a filter in your Validator:

priceprice between 100-1000?
14FALSE
455TRUE
324TRUE
39FALSE
5589TRUE

This is a conceptual model of what happens in the β€˜filtering’ stage. Note: No actual column is created in your data, this model is for explanation purposes only.

Only datapoints passing the filter, which in this case is greater than 100, are included in the metric calculation. Whether it is a row count Validator or a mean Validator, datapoints not passing the filter logic are not included in metric calculation.

Backfill

Backfill is used to load and view historical data in your validations. It is also used to train algorithms on historical data, so that validators can provide value from day one.

Typically, backfill of validators with historical data occurs when you start a source for the first time, if historical data is available. On a started Source, you can select the backfill option when configuring new Validators if you want to load historical data.

πŸ“˜

Pending backfill on an already started Source

If you select the backfill option when you create a Validator, the Validator is put into the Pending Backfill state. You must trigger a manual backfill on the source to get the data in the Validator.