HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Numeric distribution

Numerical reference statistics between two datasets.

Validator overview

You can use the numeric distribution Validator to make sure that your numeric fields have stable properties over time.

Configuration

Step

Required

Parameters

Options

Validator type

Numeric distribution

Config

Metric

Relative entropy
Mean ratio
Maximum ratio
Minimum ratio
Standard deviation ratio
Sum ratio

Config

Backfill

Initialize with backfill (checkbox)

Source config

Field

List of source fields with numeric data types

Source config

Segmentation

  1. Select a configured Segmentation

Or

  1. Unsegmented (default)

Source config

Window

Select a configured Window

Source config

Filter

No filter (default)
Enum
Null (*1)
String
Threshold filter

Reference source config

Source

Specify a Source to use as reference source

Reference source config

Field

List of reference source fields with numeric data types

Reference source config

Window

Select a configured Window

Reference source config

Window offset

Select how many Windows you want to offset by

Reference source config

Number of Windows

Select how many Windows to include

Reference source config

Filter

No filter (default)
Boolean
Enum
Null (*1)
String
Threshold Filter

Threshold

Threshold type

Fixed threshold
Dynamic threshold

Threshold

✅(*2)

Operator

Less than
Less than or equal
Equal
Not equal
Greater than
Greater than or equal

Threshold

✅(*3)

Value

Specify numeric value to validate threshold on

Threshold

✅(*4)

Sensitivity

Enter a numeric value

Threshold

✅(*4)

Decision bounds type

Upper
Lower
Upper and lower (default)

*1 Only applicable for nullable columns

*2 Only applicable for Fixed thresholds

*3 Only applicable for Fixed thresholds

*4 Only applicable for Dynamic thresholds

Configuration details

Relative entropy

In Validio, relative entropy is based on the Kullback - Leibler divergence measure.

Relative entropy is presented as a percentage where:

  • 0% means identical empirical distributions.
  • 100% means maximal difference in empirical distributions.

📘

You can use relative entropy to validate distribution shifts in your data over time, or to compare the distributions of two data sets.

Ratio metrics

Calculates the ratio of mean, sum, maximum, minimum, or standard deviation between the two datasets:

Ratio = source metric/reference metric

Reference source

For information on how you configure the reference source, refer to reference source.

Sensitivity

Higher sensitivity means that the accepted range of values is narrower, which identifies more anomalies. Conversely, lower sensitivity values imply a wider range of accepted values, which identifies fewer anomalies.