Numeric Distribution
Numerical reference statistics between two datasets.
Validator overview
You can use the numeric distribution Validator to make sure that your numeric fields have stable properties over time.
Configuration
Step | Required | Parameters | Options |
---|---|---|---|
Validator type | ✅ | Numeric distribution | - |
Config | ✅ | Metric | Relative entropy Mean ratio Maximum ratio Minimum ratio Standard deviation ratio Sum ratio |
Config | Backfill | Initialize with backfill (checkbox) | |
Source config | ✅ | Field | List of source fields with numeric data types |
Source config | ✅ | Segmentation | 1. Select a configured Segmentation Or 2. Unsegmented (default) |
Source config | ✅ | Window | Select a configured Window |
Source config | Filter | No filter (default) Enum Null (*1) String Threshold filter | |
Reference source config | ✅ | Source | Specify a Source to use as reference source |
Reference source config | ✅ | Field | List of reference source fields with numeric data types |
Reference source config | ✅ | Window | Select a configured Window |
Reference source config | ✅ | Window offset | Select how many Windows you want to offset by |
Reference source config | ✅ | Number of Windows | Select how many Windows to include |
Reference source config | Filter | No filter (default) Boolean Enum Null (*1) String Threshold Filter | |
Threshold | ✅ | Threshold type | Fixed threshold Dynamic threshold |
Threshold | ✅(*2) | Operator | Less than Less than or equal Equal Not equal Greater than Greater than or equal |
Threshold | ✅(*3) | Value | Specify numeric value to validate threshold on |
Threshold | ✅(*4) | Sensitivity | Enter a numeric value |
Threshold | ✅(*4) | Decision bounds type | Upper Lower Upper and lower (default) |
*1 Only applicable for
nullable columns
*2 Only applicable for
Fixed thresholds
*3 Only applicable for
Fixed thresholds
*4 Only applicable for
Dynamic thresholds
Configuration details
Relative entropy
In Validio, relative entropy is based on the Kullback - Leibler divergence measure.
Relative entropy is presented as a percentage where:
0%
means identical empirical distributions.100%
means maximal difference in empirical distributions.
You can use relative entropy to validate distribution shifts in your data over time, or to compare the distributions of two data sets.
Ratio metrics
Calculates the ratio of mean, sum, maximum, minimum, or standard deviation between the two datasets:
Ratio = source metric/reference metric
Reference source
For information on how you configure the reference source, refer to reference source.
Sensitivity
Higher sensitivity means that the accepted range of values is narrower, which identifies more anomalies. Conversely, lower sensitivity values imply a wider range of accepted values, which identifies fewer anomalies.
Updated 3 months ago