Numeric anomaly
Identify numeric anomalies in your data with Machine Learning algorithms.
Validator overview
Validate individual field values for every datapoint Validio reads, by comparing the field value in a reference source. Dynamic anomaly bounds are configured with the sensitivity parameter.
The Numeric anomaly validator identifies anomalies based on either count or percentage:
- Count: Counting how many datapoints are identified as an anomaly in each window.
- Percentage: Counting the share of datapoints that are identified as an anomaly in each window.
Configuration
Step | Required | Parameters | Options |
---|---|---|---|
Validator type | ✅ | Numeric anomaly | - |
Config | ✅ | Metric | Count Percentage |
Config | ✅ | Sensitivity | Enter a numeric value |
Config | Advanced config | Minimum absolute difference Minimum number of reference datapoints Minimum relative difference percent | |
Config | Backfill | Initialize with backfill (checkbox) | |
Source fields | ✅ | Field | List of source fields with numeric data types |
Source config | ✅ | Segmentation | 1. Select a configured Segmentation Or 2. Unsegmented (default) |
Source config | ✅ | Window | Select a configured Window |
Source config | Filter | No filter (default) Boolean Enum Null (*1) String Threshold filter | |
Reference source config | ✅ | Sources | Select a Source to use as reference source |
Reference source config | ✅ | Field | Select a valid field from your reference source |
Reference source config | ✅ | Window | Select a configured Window |
Reference source config | ✅ | Window offset | Select how many Windows you want to offset by |
Reference source config | ✅ | Number of Windows | Select how many Windows to include |
Reference source config | Filter | No filter (default) Enum Null (*1) String Threshold Filter | |
Threshold | ✅ | Threshold type | Fixed threshold Dynamic threshold |
Threshold | ✅(*2) | Operator | Less than Less than or equal Equal Not equal Greater than Greater than or equal |
Threshold | ✅(*2) | Value | Numeric value to validate threshold on |
Threshold | ✅(*3) | Sensitivity | Enter a numeric value |
Threshold | ✅(*3) | Decision bounds type | Upper Lower Upper and lower (default) |
*1 Only applicable for
nullable columns
.*2 Only applicable for
Fixed thresholds
.*3 Only applicable for
Dynamic thresholds
.
Configuration details
Sensitivity
Higher sensitivity means that the accepted range of values is narrower, which identifies more anomalies. Conversely, lower sensitivity values imply a wider range of accepted values, which identifies fewer anomalies.
Advanced config
Minimum absolute difference:
The minimum absolute difference
between the field value and the mean of the reference distribution for the datapoint to be considered an anomaly.
For example, if set to 10
, the difference between the mean of the reference distribution and the datapoint being validated must be greater than 10, and be outside the dynamic bounds to be considered an anomaly. Essentially, this is an ignore any incidents within the difference
parameter.
Minimum number of reference datapoints:
Minimum number of datapoints in reference source before triggering a metric calculation.
Minimum relative difference percent:
Minimum difference for datapoints to be considered an anomaly expressed in relative terms, divides absolute difference
with absolute of the mean of the reference data
.
For example, if the mean of the reference distribution is 10
, and user sets 10%
as parameter value, then, datapoints falling between 9
and 11
are not considered anomalies.
We recommend that you use this option instead of minimum absolute difference
, when you are more interested in the relative difference to the reference mean
, than the absolute difference
.
Reference source
For information on how you configure the reference source, refer to Reference Source.
Updated about 1 year ago