Validator Recipes
Examples of common and useful Validator configurations
Validators can be used to validate data for a huge number of use cases. Some recipes for common and useful validations are described below.
For even more customizability we recommend exploring the Custom SQL recipes , and the SQL filter recipes.
Ensure data is up to date, for example data from every supplier
Validates freshness, for every segment defined by a segmentation field
Wizard step | Setting | Value |
---|---|---|
Validator type | Validator type | Freshness |
Source config | Segmentation | Segmentation field, e.g. supplier_id |
Threshold | Threshold type | Dynamic threshold |
Note that polling interval and window size have a significant impact on freshness validation.
For instance, consider supplier updates occurring around XX:40 every hour, and you have set the window size to hourly. Freshness is assessed against the end of the window, making the expected freshness 20 minutes. If one supplier's data is updated late, say 09:55, the freshness will be reduced to 5 minutes, which is captured by the dynamic thresholds as an anomaly.
Additionally, in this scenario, setting the polling interval to occur every whole hour offers two advantages. First, it enables immediate freshness evaluation right after polling, facilitating prompt notifications. Second, it helps avoid late arrivals. For instance, data with a timestamp of 10:40 arriving at 11:02 will not affect freshness as the poll closes the window at 11:00.
Note that a very frequent polling interval can impact cost and performance, depending on, for example, amount of data and other validators running on the same Source.
Validate a metric per segment, such as revenue per product group
Validates an aggregated metric, for every segment defined by a segmentation field
Wizard step | Setting | Value |
---|---|---|
Validator type | Validator | Numeric |
Config | Metric | Sum |
Source config | Field | Field in question, e.g. revenue |
Source config | Segmentation | Segmentation field, e.g. product_group |
Threshold | Threshold type | Dynamic threshold |
Useful variants:
- Some validations benefit from other metrics than
Sum
, for exampleMean
. - You can segment on multiple fields, e.g.
product_group
andcountry
.
Count NULLs
Counts records where a field value is NULL
Wizard step | Setting | Value |
---|---|---|
Validator type | Validator | Volume |
Config | Metric | Count |
Source config | Field | No field (use record) |
Source config | Filter type | Null filter |
Source config | Filter field | Field in question |
Source config | Operator | is null |
Threshold | Threshold type | Dynamic threshold |
Useful variants:
- Track percentage of NULLs, by setting
Metric
toPercentage
instead ofCount
. - You can modify the filter to count something other than NULLs, for example count records where a field value is TRUE, 0 or "".
- Apply a Segmentation to track data per segment, for example, per ad campaign source.
- Use a
Fixed threshold
Equal
0
if no NULLs are accepted.
Validate uniqueness, e.g. unique primary keys
Counts duplicates and ensures that value is always 0
Wizard step | Setting | Value |
---|---|---|
Validator type | Validator | Volume |
Config | Metric | Duplicates count |
Source config | Fields | Field(s) in question, e.g. customer_id |
Threshold | Threshold type | Fixed threshold |
Threshold | Operator | Equal |
Threshold | Value | 0 |
Useful variants:
- Pick multiple fields is useful, for example, when the primary key is a composition of multiple fields.
Validate a calculated ratio, such as conversion rate
Calculates and validates a ratio between two fields, for example sales per lead
Wizard step | Setting | Value |
---|---|---|
Validator type | Validator | Numeric distribution |
Config | Metric | Sum ratio |
Source config | Field | Numerator field, e.g. sales |
Reference source config | Sources | Same as in Source config |
Reference source config | Field | Denominator field, e.g. leads |
Reference source config | Window | Same as in Source config |
Reference source config | Window offset | 0 |
Reference source config | Number of windows | 1 |
Threshold | Threshold type | Dynamic threshold |
Useful variants:
- Some validations benefit from other metrics than
Sum ratio
, for exampleMean ratio
. - Apply a Segmentation to track data per segment, for example, per customer group.
Validate a time difference, such as mean session duration
Calculates and validates the difference between to timestamp fields
Wizard step | Setting | Value |
---|---|---|
Validator type | Validator | Relative time |
Config | Source field | Timestamp field, e.g. session_start |
Config | Field to subtract | Timestamp field, e.g. session_end |
Config | Metric | Mean difference |
Threshold | Threshold type | Dynamic threshold |
Useful variants:
- Some validations benefit from other metrics than
Mean difference
, for exampleMaximum difference
. - Apply a Segmentation to track data per segment, for example, per customer group.
Updated 9 months ago