HomeRequest DemoContact

Numeric distribution

Numerical reference statistics between two datasets.

Validator overview

You can use the numeric distribution Validator to make sure that your numeric fields have stable properties over time.

Configuration

Step

Required

Parameters

Options

Validator type

✅

Numeric distribution

Config

✅

Metric

Relative entropy
Mean ratio
Maximum ratio
Minimum ratio
Standard deviation ratio
Sum ratio

Config

Backfill

Initialize with backfill (checkbox)

Source config

✅

Field

List of source fields with numeric data types

Source config

✅

Segmentation

  1. Select a configured Segmentation

Or

  1. Unsegmented (default)

Source config

✅

Window

Select a configured Window

Source config

Filter

No filter (default)
Enum
Null (*1)
String
Threshold filter

Reference source config

✅

Source

Specify a Source to use as reference source

Reference source config

✅

Field

List of reference source fields with numeric data types

Reference source config

✅

Window

Select a configured Window

Reference source config

✅

Window offset

Select how many Windows you want to offset by

Reference source config

✅

Number of Windows

Select how many Windows to include

Reference source config

Filter

No filter (default)
Boolean
Enum
Null (*1)
String
Threshold Filter

Threshold

✅

Threshold type

Fixed threshold
Dynamic threshold

Threshold

✅(*2)

Operator

Less than
Less than or equal
Equal
Not equal
Greater than
Greater than or equal

Threshold

✅(*3)

Value

Specify numeric value to validate threshold on

Threshold

✅(*4)

Sensitivity

Enter a numeric value

Threshold

✅(*4)

Decision bounds type

Upper
Lower
Upper and lower (default)

*1 Only applicable for nullable columns

*2 Only applicable for Fixed thresholds

*3 Only applicable for Fixed thresholds

*4 Only applicable for Dynamic thresholds

Configuration details

Relative entropy

In Validio, relative entropy is based on the Kullback - Leibler divergence measure.

Relative entropy is presented as a percentage where:

  • 0% means identical empirical distributions.
  • 100% means maximal difference in empirical distributions.

📘

You can use relative entropy to validate distribution shifts in your data over time, or to compare the distributions of two data sets.

Ratio metrics

Calculates the ratio of mean, sum, maximum, minimum, or standard deviation between the two datasets:

Ratio = source metric/reference metric

Reference source

For information on how you configure the reference source, refer to reference source.

Sensitivity

Higher sensitivity means that the accepted range of values is narrower, which identifies more anomalies. Conversely, lower sensitivity values imply a wider range of accepted values, which identifies fewer anomalies.