Map of the territory
This page explains the concept of Monitors and Filters in Validio and provides useful context for understanding the concepts. For an overview and details of specific Monitors and Filters, check out the monitor overview page, filter overview page and the corresponding sub-pages
Monitors and Filters track specific metrics computed from one or several features. What a ‘metric’ is differs slightly depending on if it’s a Monitor or Filter. The difference between the two stems from monitoring data on dataset level or datapoint level; a Monitor checks dataset metrics on Dataset Pipelines, a Filter filters datapoints in Datapoint Pipelines and computes a specific metric unique to datapoint pipelines.
A Monitor tracks a Dataset metric computed from one or more features (fields/columns). A few examples include:
- Mean of ‘age’
- Cardinality of ‘country’
- Standard deviation of ‘price’
- Mode of ‘education’ (category with highest number of records)
- Explained variance by the primary PCA component on target features ‘age’, ‘income’ and ‘credit_score’ (yes, this metric is supported in Validio, read more about PCA statistcs here
A Monitor can also track reference metrics (statistics), that is metrics produced by comparing features from two different datasets, e.g.:
- Ratio between two means: mean of ‘table1_age’ and mean of ‘table2_age’
- Relative entropy between ‘table1_basket_size’ and ‘table2_basket_size’ (used to check distribution shifts, learn more about Relative Entropy here)
- Number of new categories in ‘table1_country’ compared to ‘table2_country’
What a Monitor ultimately tracks is a scalar value or a string value (e.g. mode of a categorical feature) which is computed from one or several features as in the examples above, this is what we refer to as ‘Metric’ in the case for Monitors. (Not to be confused with the mathematical definition of Metric)
Filter metrics differ slightly from Monitor metrics. On a high-level a Filter does two things:
- As the name suggests, a Filter filters individual data points based on specified logic, e.g. if a numeric value is above a certain threshold
- Based on the filtered datapoints, a Filter will then produce a Metric, e.g. % of datapoints over the specified threshold
Let’s walk through an example to illustrate what we mean.
Imagine we get the following values from the ‘price’ feature the last 60 seconds:
Five records with Price feature showed. Other features not relevant for this particular example
Let’s say that we want to check that the value of ‘price’ in each record is between 100 - 1 000. Conceptually, a way to think about how filtering works in Validio is to imagine that a new boolean column is created, where each record is then evaluated against the chosen logic.
|…||price||‘price’ between 100-1000?||…|
Conceptual model of what happens in the ‘filtering’ stage. Note that no actual column is created in your data, for conceptual explanation purposes only
Based on the filtered datapoints and the ‘illustrative filtering column’, a Filter will then produce a metric which will be monitored. A filter can produce five types of metrics:
|Metric description||Example output based on above example|
Filter metrics that Validio will monitor, alerts can be defined on any of these metrics, e.g. that failing data points should never be more than a certain count or %. How often the metrics should be calculated is defined by a cron trigger when setting up a datapoint pipeline.
Validio can also write all the individual data points that are filtered as erroneous. Learn more about sinking out bad data points here
While the details of ‘Metric’ is slightly different between Monitors and Filters as explained above, what they have in common, is that it is the computed quantity that Validio ultimately monitors and what alerts are applied on. Essentially a Metric is a product from a Monitor or Filter that Validio monitors and validates.
Updated 3 months ago