Validator Recipes

Examples of common and useful Validator configurations

Validators can be used to validate data for a huge number of use cases. Some recipes for common and useful validations are described below.

πŸ‘

For even more customizability we recommend exploring the Custom SQL recipes , and the SQL filter recipes.


Ensure data is up to date, for example data from every supplier

Validates freshness, for every segment defined by a segmentation field

Wizard stepSettingValue
Validator typeValidator typeFreshness
Source configSegmentationSegmentation field, e.g. supplier_id
ThresholdThreshold typeDynamic threshold

Note that polling interval and window size have a significant impact on freshness validation.

For instance, consider supplier updates occurring around XX:40 every hour, and you have set the window size to hourly. Freshness is assessed against the end of the window, making the expected freshness 20 minutes. If one supplier's data is updated late, say 09:55, the freshness will be reduced to 5 minutes, which is captured by the dynamic thresholds as an anomaly.

Additionally, in this scenario, setting the polling interval to occur every whole hour offers two advantages. First, it enables immediate freshness evaluation right after polling, facilitating prompt notifications. Second, it helps avoid late arrivals. For instance, data with a timestamp of 10:40 arriving at 11:02 will not affect freshness as the poll closes the window at 11:00.

🚧

Note that a very frequent polling interval can impact cost and performance, depending on, for example, amount of data and other validators running on the same Source.


Validate a metric per segment, such as revenue per product group

Validates an aggregated metric, for every segment defined by a segmentation field

Wizard stepSettingValue
Validator typeValidatorNumeric
ConfigMetricSum
Source configFieldField in question, e.g. revenue
Source configSegmentationSegmentation field, e.g. product_group
ThresholdThreshold typeDynamic threshold

Useful variants:

  • Some validations benefit from other metrics than Sum, for example Mean.
  • You can segment on multiple fields, e.g. product_group and country.

Count NULLs

Counts records where a field value is NULL

Wizard stepSettingValue
Validator typeValidatorVolume
ConfigMetricCount
Source configFieldNo field (use record)
Source configFilter typeNull filter
Source configFilter fieldField in question
Source configOperatoris null
ThresholdThreshold typeDynamic threshold

Useful variants:

  • Track percentage of NULLs, by setting Metric to Percentage instead of Count.
  • You can modify the filter to count something other than NULLs, for example count records where a field value is TRUE, 0 or "".
  • Apply a Segmentation to track data per segment, for example, per ad campaign source.
  • Use a Fixed threshold Equal 0 if no NULLs are accepted.

Validate uniqueness, e.g. unique primary keys

Counts duplicates and ensures that value is always 0

Wizard stepSettingValue
Validator typeValidatorVolume
ConfigMetricDuplicates count
Source configFieldsField(s) in question, e.g. customer_id
ThresholdThreshold typeFixed threshold
ThresholdOperatorEqual
ThresholdValue0

Useful variants:

  • Pick multiple fields is useful, for example, when the primary key is a composition of multiple fields.

Validate a calculated ratio, such as conversion rate

Calculates and validates a ratio between two fields, for example sales per lead

Wizard stepSettingValue
Validator typeValidatorNumeric distribution
ConfigMetricSum ratio
Source configFieldNumerator field, e.g. sales
Reference source configSourcesSame as in Source config
Reference source configFieldDenominator field, e.g. leads
Reference source configWindowSame as in Source config
Reference source configWindow offset0
Reference source configNumber of windows1
ThresholdThreshold typeDynamic threshold

Useful variants:

  • Some validations benefit from other metrics than Sum ratio, for example Mean ratio.
  • Apply a Segmentation to track data per segment, for example, per customer group.

Validate a time difference, such as mean session duration

Calculates and validates the difference between to timestamp fields

Wizard stepSettingValue
Validator typeValidatorRelative time
ConfigSource fieldTimestamp field, e.g. session_start
ConfigField to subtractTimestamp field, e.g. session_end
ConfigMetricMean difference
ThresholdThreshold typeDynamic threshold

Useful variants:

  • Some validations benefit from other metrics than Mean difference, for example Maximum difference.
  • Apply a Segmentation to track data per segment, for example, per customer group.