Categorical Reference
Categorical reference statistics between two datasets
Configuration parameters
Parameter name and description  Parameter values 
1. Name  Arbitrary string 
2. Target feature  List of source features with String data type 
3. Computed metric 

4. Reference feature  List of reference source features with String data type 
Parameter details
New categories
Validating number of new categories in target dataset vs. reference dataset. Example:
Target dataset Records and values in categorical feature being monitored 
Reference dataset Records and values in categorical feature being monitored 
C  A 
D  B 
E  C 
F  D 
E 
Example: Showing all the values from the categorical feature being monitored from the respective datasets
Compared to the Reference dataset, Target dataset has one new categorical value ‘F’, the number of new categories is thus one.
Removed categories
Following the same example as above, two categorical values are missing in the target dataset vs. reference dataset  ‘A’ and ‘B’. The number of removed categories are thus two.
Changed categories
Lastly, following the same example as above, the number of changed categories is simply the sum of new and removed categories, i.e. 1+2+=3.
Ratio metrics
Calculates the ratio metrics between the two datasets:
Ratio = target metric/reference metric
Relative entropy
Relative entropy in Validio is an adapted implementation of the symmetrised Kullback  Leibler divergence.
Relative entropy is used to detect distribution shifts between a target set and a reference set and will produce a nonnegative numerical metric, where zero implies identical empirical distributions and gets larger as the two distributions become increasingly different.
Don’t have experience with relative entropy?
But still want to monitor distribution shifts? No worries! Apply a Smart Alert and monitor changes in relative entropy without having to worry about what the absolute value means
Updated 2 months ago