HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact
These docs are for v2.1. Click to read the latest docs for v4.2.

Numeric distribution

Numerical reference statistics between two datasets.

Validator overview

You can use the numeric distribution Validator to make sure that your numeric fields have stable properties over time.

Configuration

StepRequiredParametersOptions
Validator typeNumeric distribution-
ConfigMetricRelative entropy
Mean ratio
Maximum ratio
Minimum ratio
Standard deviation ratio
Sum ratio
ConfigBackfillInitialize with backfill (checkbox)
Source configFieldList of source fields with numeric data types
Source configSegmentation1. Select a configured Segmentation

Or

2. Unsegmented (default)
Source configWindowSelect a configured Window
Source configFilterNo filter (default)
Enum
Null (*1)
String
Threshold filter
Reference source configSourceSpecify a Source to use as reference source
Reference source configFieldList of reference source fields with numeric data types
Reference source configWindowSelect a configured Window
Reference source configWindow offset Select how many Windows you want to offset by
Reference source configNumber of WindowsSelect how many Windows to include
Reference source configFilterNo filter (default)
Boolean
Enum
Null (*1)
String
Threshold Filter
ThresholdThreshold typeFixed threshold
Dynamic threshold
Threshold✅(*2)OperatorLess than
Less than or equal
Equal
Not equal
Greater than
Greater than or equal
Threshold✅(*3)ValueSpecify numeric value to validate threshold on
Threshold✅(*4)SensitivityEnter a numeric value
Threshold✅(*4)Decision bounds typeUpper
Lower
Upper and lower (default)

*1 Only applicable for nullable columns

*2 Only applicable for Fixed thresholds

*3 Only applicable for Fixed thresholds

*4 Only applicable for Dynamic thresholds

Configuration details

Relative entropy

In Validio, relative entropy is based on the Kullback - Leibler divergence measure.

Relative entropy is presented as a percentage where:

  • 0% means identical empirical distributions.
  • 100% means maximal difference in empirical distributions.

📘

You can use relative entropy to validate distribution shifts in your data over time, or to compare the distributions of two data sets.

Ratio metrics

Calculates the ratio of mean, sum, maximum, minimum, or standard deviation between the two datasets:

Ratio = source metric/reference metric

Reference source

For information on how you configure the reference source, refer to reference source.

Sensitivity

Higher sensitivity means that the accepted range of values is narrower, which identifies more anomalies. Conversely, lower sensitivity values imply a wider range of accepted values, which identifies fewer anomalies.