About Thresholds

Thresholds define the conditions required for a validator to consider a metric to be a data quality incident or anomaly. When the validator detects data that breaches the defined threshold, it creates an incident that you can inspect on the validator details page. Optionally, you can define rules to notify you about identified incidents and route the notifications to different channels, such as Slack or webhooks.

When creating a Validator, you must configure a threshold to identify data quality incidents. Choosing the right type of threshold depends on the metric you are monitoring and the data quality issues you need to detect. Validio supports the following types of thresholds:

  • Dynamic Threshold–Automatically calculates thresholds for numeric metrics based on statistical methods and allows you to adjust the sensitivity, which is the range of accepted threshold values. For example, you can define a dynamic threshold to track the daily average of sales to detect anomalies. For more information, see Configuring Dynamic Thresholds.
  • Fixed Threshold–Performs comparison operations between the metric and a specified numeric threshold. For example, you can define a fixed threshold to check that no values in the field "Age" are less than zero. For more information, see Configuring Fixed Thresholds.
  • Difference Threshold–Monitors a metric and alerts when the metric value deviates from a specified absolute value or percentage value for consecutive windows. For example, you can track the mean of a numeric value and alert when the metric value has decreased by X percentage over two consecutive days. For more information, see Configuring Difference Thresholds.

For more information about reviewing and managing validator incidents, see About Validator Details and About Validator Incidents. For information on how to configure rules and channels to receive notifications when validator incidents occur, see About Notifications.

Choosing the Right Threshold Type

The best choice of threshold depends on the nature of the metric you are monitoring and the specific types of data quality issues you need to detect.

Dynamic Threshold

When to use it:

  • For time-series metrics where the expected value fluctuates naturally, such as with daily or weekly patterns.
  • For exploratory monitoring of new data sources where patterns are not yet fully understood. Dynamic Threshold should be your go-to choice for unknown metrics.
  • When you need to detect anomalies or unexpected changes relative to the metric's recent history and seasonal behavior.
  • When you use Segmentation and you can't set a unique rule for each of the segments.

Examples:

  • Detecting spikes or drops in Daily Active Users compared to typical weekday or weekend patterns.
  • Monitoring Hourly Sales Volume to identify deviations from expected trends and seasonality.
  • Tracking API Latency to catch unexpected performance degradations relative to normal operational variations.

Fixed Threshold

When to use it:

  • When you have absolute, known boundaries that should never be crossed based on business rules, logical constraints, or physical limitations.
  • For simple validity checks where the acceptable range is static and well-understood.
  • For freshness checks where you know the expected data freshness (data should always be updated daily).

Examples:

  • Ensure values in an Age column are always Greater than or equal to 0.
  • Verify that a Discount Percentage field is always Less than or equal to 100.
  • Verify referential integrity violations count (such as orphaned foreign keys) is Equal to 0.
  • Ensure freshness (time since last update) is Less than or equal to a specific duration (such as 24 hours).

Difference Threshold

When to use it:

  • When the rate of change or the magnitude of shift between periods is more critical than the absolute value itself.
  • For monitoring stability or consistency over short time frames.

Examples:

  • Alerting if the number of orders Decreases by more than 20% compared to the previous day for 2 consecutive days.
  • Flagging if database size Increases by more than 10 GB (absolute value) compared to the previous hour.
  • Monitoring Error Rate to trigger an alert if it Strictly increases by more than 5 percentage points compared to the last 2 windows.

Use Case Examples

Use Case ScenarioRecommended Threshold TypeReason
Enforcing strict business rules (such as age >= 0)FixedBoundaries are known, static, and absolute.
Ensuring data is never older than 1 dayFixedA static, maximum acceptable Freshness limit.
Verifying referential integrity (0 violations)FixedA binary check; the desired count of violations is a fixed value (0).
Detecting unusual spikes in website trafficDynamic"Normal" traffic varies; need to detect deviations from learned patterns.
Monitoring daily sales for anomaliesDynamicSales often have trends or seasonality; fixed limits are impractical.
Alerting on persistent drops in daily user signupsDifferenceConcerned about a steady change over the previous few days.
Checking if a critical value never exceeds 99.9%FixedA hard, unchanging limit based on system constraints.
Monitoring a new metric with unknown patternsDynamicAllows the system to learn the pattern and highlight deviations.