Categorical Reference

Categorical reference statistics between two datasets

Configuration parameters

Parameter name and description Parameter values
1. Name Arbitrary string
2. Target feature List of source features with String data type
3. Computed metric
  • New categories
  • New categories ratio
  • Removed categories
  • Removed categories ratio
  • Changed categories
  • Changed categories ratio
  • Relative entropy
4. Reference feature List of reference source features with String data type

Parameter details

New categories

Validating number of new categories in target dataset vs. reference dataset. Example:

Target dataset

Records and values in categorical feature being monitored

Reference dataset

Records and values in categorical feature being monitored


Example: Showing all the values from the categorical feature being monitored from the respective datasets

Compared to the Reference dataset, Target dataset has one new categorical value ‘F’, the number of new categories is thus one.

Removed categories

Following the same example as above, two categorical values are missing in the target dataset vs. reference dataset - ‘A’ and ‘B’. The number of removed categories are thus two.

Changed categories

Lastly, following the same example as above, the number of changed categories is simply the sum of new and removed categories, i.e. 1+2+=3.

Ratio metrics

Calculates the ratio metrics between the two datasets:

Ratio = target metric/reference metric

Relative entropy

Relative entropy in Validio is an adapted implementation of the symmetrised Kullback - Leibler divergence.

Relative entropy is used to detect distribution shifts between a target set and a reference set and will produce a non-negative numerical metric, where zero implies identical empirical distributions and gets larger as the two distributions become increasingly different.


Don’t have experience with relative entropy?

But still want to monitor distribution shifts? No worries! Apply a Smart Alert and monitor changes in relative entropy without having to worry about what the absolute value means