Introduction

Monitors and filters track a specified metric calculated from one or more features

📘

Map of the territory

This page explains the concept of Monitors and Filters in Validio and provides useful context for understanding the concepts. For an overview and details of specific Monitors and Filters, check out the monitor overview page, filter overview page and the corresponding sub-pages

Monitors and Filters track specific metrics computed from one or several features. What a ‘metric’ is differs slightly depending on if it’s a Monitor or Filter. The difference between the two stems from monitoring data on dataset level or datapoint level; a Monitor checks dataset metrics on Dataset Pipelines, a Filter filters datapoints in Datapoint Pipelines and computes a specific metric unique to datapoint pipelines.

20002000

Multiple Monitor and Filters can be created for one Dataset- or Datapoint pipeline to track multiple metrics

What does a Monitor track and what’s a Monitor metric?

A Monitor tracks a Dataset metric computed from one or more features (fields/columns). A few examples include:

  • Mean of ‘age’
  • Cardinality of ‘country’
  • Standard deviation of ‘price’
  • Mode of ‘education’ (category with highest number of records)
  • Explained variance by the primary PCA component on target features ‘age’, ‘income’ and ‘credit_score’ (yes, this metric is supported in Validio, read more about PCA statistcs here

A Monitor can also track reference metrics (statistics), that is metrics produced by comparing features from two different datasets, e.g.:

  • Ratio between two means: mean of ‘table1_age’ and mean of ‘table2_age’
  • Relative entropy between ‘table1_basket_size’ and ‘table2_basket_size’ (used to check distribution shifts, learn more about Relative Entropy here)
  • Number of new categories in ‘table1_country’ compared to ‘table2_country’

What a Monitor ultimately tracks is a scalar value or a string value (e.g. mode of a categorical feature) which is computed from one or several features as in the examples above, this is what we refer to as ‘Metric’ in the case for Monitors. (Not to be confused with the mathematical definition of Metric)

What does a Filter track and what’s a Filter metric?

Filter metrics differ slightly from Monitor metrics. On a high-level a Filter does two things:

  1. As the name suggests, a Filter filters individual data points based on specified logic, e.g. if a numeric value is above a certain threshold
  2. Based on the filtered datapoints, a Filter will then produce a Metric, e.g. % of datapoints over the specified threshold

Let’s walk through an example to illustrate what we mean.

A Filter example

Imagine we get the following values from the ‘price’ feature the last 60 seconds:

price
14
455
324
29
5589

Five records with Price feature showed. Other features not relevant for this particular example

1. Filtering datapoints

Let’s say that we want to check that the value of ‘price’ in each record is between 100 - 1 000. Conceptually, a way to think about how filtering works in Validio is to imagine that a new boolean column is created, where each record is then evaluated against the chosen logic.

price ‘price’ between 100-1000?
14 FALSE
455 TRUE
324 TRUE
29 FALSE
5589 FALSE

Conceptual model of what happens in the ‘filtering’ stage. Note that no actual column is created in your data, for conceptual explanation purposes only

2. Producing and monitoring a metric

Based on the filtered datapoints and the ‘illustrative filtering column’, a Filter will then produce a metric which will be monitored. A filter can produce five types of metrics:

Metric description Example output based on above example
Passing 2
Failing 3
Passing percentage 40%
Failing percentage 60%
Total 5

Filter metrics that Validio will monitor, alerts can be defined on any of these metrics, e.g. that failing data points should never be more than a certain count or %. The Passing and Failing percentage calculates the % of the datapoints ingested last 60 seconds.

📘

Validio can also write all the individual data points that are filtered as erroneous. Learn more about sinking out bad data points here

Why is it called ‘Metric’ for both Monitors and Filters?

While the details of ‘Metric’ is slightly different between Monitors and Filters as explained above, what they have in common, is that it is the computed quantity that Validio ultimately monitors and what alerts are applied on. Essentially a Metric is a product from a Monitor or Filter that Validio monitors and validates.


Did this page help you?