Validator overview

Validate individual field values for every datapoint Validio reads, by comparing the field value in a reference source. Dynamic anomaly bounds are configured with the sensitivity parameter.

The Numeric anomaly validator identifies anomalies based on either count or percentage:

Count: Counting how many datapoints are identified as an anomaly in each window.
Percentage: Counting the share of datapoints that are identified as an anomaly in each window.

Configuration

Step	Required	Parameters	Options
Validator type	✅	Numeric anomaly
Config	✅	Metric	Count Percentage
Config	✅	Sensitivity	Enter a numeric value
Config		Advanced config	Minimum absolute difference Minimum number of reference datapoints Minimum relative difference percent
Config		Backfill	Initialize with backfill (checkbox)
Source fields	✅	Field	List of source fields with numeric data types
Source config	✅	Segmentation	Select a configured SegmentationOr Unsegmented (default)
Source config	✅	Window	Select a configured Window
Source config		Filter	No filter (default) Boolean Enum Null (*1) String Threshold filter
Reference source config	✅	Sources	Select a Source to use as reference source
Reference source config	✅	Field	Select a valid field from your reference source
Reference source config	✅	Window	Select a configured Window
Reference source config	✅	Window offset	Select how many Windows you want to offset by
Reference source config	✅	Number of Windows	Select how many Windows to include
Reference source config		Filter	No filter (default) Enum Null (*1) String Threshold Filter
Threshold	✅	Threshold type	Fixed threshold Dynamic threshold
Threshold	✅(*2)	Operator	Less than Less than or equal Equal Not equal Greater than Greater than or equal
Threshold	✅(*2)	Value	Numeric value to validate threshold on
Threshold	✅(*3)	Sensitivity	Enter a numeric value
Threshold	✅(*3)	Decision bounds type	Upper Lower Upper and lower (default)

*1 Only applicable for nullable columns.

*2 Only applicable for Fixed thresholds.

*3 Only applicable for Dynamic thresholds.

Configuration details

Sensitivity

Higher sensitivity means that the accepted range of values is narrower, which identifies more anomalies. Conversely, lower sensitivity values imply a wider range of accepted values, which identifies fewer anomalies.

The Numeric Anomaly validator has two sensitivity settings: The first sensitivity is set during the Config step, and the second sensitivity is set during the Threshold step, for a Dynamic Threshold.

To get the result of the Numeric Anomaly validator, we first calculate the mean (mean) and standard deviation (std) from the reference source. Then, we compute the result that is sent to Dynamic Threshold (DT) by,

Computing the offset as largest of the following quantities 10/sensitivity * std, minimum absolute difference, mean * minimum relative difference/100 .

📘
Note
The offset formula 10/sensitivity * std uses the sensitivity defined in the Config step, but is the same formula used in the Dynamic Threshold step. Default values for minimum absolute difference and minimum relative difference are 0.

Calculating the interval (mean - offset, mean + offset).

The result produced by this validator is the number of datapoints outside the interval. This can be summarized by a count or a percentage (whichever you selected in the Config step). For more information, refer to the Validator Example.

The following table maps the numeric value of Config Sensitivity to standard deviations:

Config Sensitivity Values	Standard Deviations
3.2	3.1
2.0	5.0
1.2	8.3

For more information, see About Thresholds.

Advanced config

Under Advanced config, you can set the Minimum absolute difference, Minimum number of reference datapoints, and Minimum relative difference percent.

Minimum absolute difference

The minimum absolute difference between the field value and the mean of the reference distribution for the datapoint to be considered an anomaly.

For example, if set to 10, the difference between the mean of the reference distribution and the datapoint being validated must be greater than 10, and be outside the dynamic bounds to be considered an anomaly. Essentially, this is an ignore any incidents within the difference parameter.

Minimum number of reference datapoints

Minimum number of datapoints in reference source before triggering a metric calculation.

Minimum relative difference percent

Minimum difference for datapoints to be considered an anomaly expressed in relative terms, divides absolute difference with absolute of the mean of the reference data.

For example, if the mean of the reference distribution is 10, and user sets 10% as parameter value, then, datapoints falling between 9 and 11 are not considered anomalies.

We recommend that you use this option instead of minimum absolute difference, when you are more interested in the relative difference to the reference mean, than the absolute difference.

**Numeric anomaly** Validator Configuration Wizard - Config.

Reference source

For information on how you configure the reference source, refer to Reference Source.

Validator Example

The numeric anomaly validator in this example is configured to compare today's window with last week's data. We want to be fairly sensitive to deviation sin the current window.

Configure the validator with the following:

Under Config,
1. Select Metric: Count.
2. Set the Sensitivity to 4.
Select all defaults for the following steps before Reference Source Config.
Under Reference Source Config,
1. Set the Window offset to 1.
2. Set the Number of Windows to 7.
Under Threshold,
1. Select Dynamic Threshold.
2. Choose the Narrow preset.

With this configuration, the validator will summarize the data with the following steps:

Create a mean and std for last week's data.
Create an interval that is equal to (mean - 10/4 * std, mean + 10/4 * std)
Count all the values in the current window that is outside of the interval defined in Step 2.
Pass the count from Step 3 to Dynamic Thresholds with a sensitivity of the selected Narrow preset.