Anomaly detection with Dynamic Thresholds

Dynamic Thresholds use a combination of smart algorithms to automatically detect anomalies in your data. The threshold model infers trends, seasonality, and peaks, and also adapts to shifts in your data. It learns from historical data and is trained on new data, continuously improving as more data is read.

When applied to a backfilled source, the dynamic thresholds can quickly detect upcoming anomalies without any training period. This means you get incidents and insight immediately, even if you lack the domain knowledge to create appropriate thresholds. You can also provide input to improve the anomaly detection algorithm. For more information, see Model Retraining.

Dynamic thresholds will continuously track and automatically update when it detects shifts in seasonality and trends. You can use dynamic thresholds to monitor sources where you expect changes in your data over time. For more information, see Seasonality Detection.

Dynamic thresholds include functionality to estimate the support of the metric and automatically adjust the lower decision bound accordingly. For more information, see Metric Support.

Dynamic Threshold Algorithm

Dynamic Threshold algorithm V2 was introduced in Validio 5.1 and made the default algorithm in Validio 6.0.

V2 features and benefits include:

Reworked Anomaly Detection Engine: Introduces a new memory-efficient architecture to power more effective and accurate, real-time anomaly detection. V2 overcomes the limitations of the V1 algorithm with improvements in handling of zeros, detection of level shifts, and stronger default behavior when adapting bounds to new data points.
Improved Freshness Validators: Freshness alerts now adapt better to your data's cadence and systematic changes (like seasonality or shifts), while alerting more consistently on stale data.
More Accurate Anomaly Detection: Precisely adjust detection bounds around level-shift and incidents. V2 adapts more accurately to data baseline changes by intelligently handling shifts and reversions (reducing false alarms), more consistently detecting true change points, and reliably tracking multiple sequential shifts.

Dynamic Threshold Parameters

The following table lists the parameters for configuring a dynamic threshold. All validator types will have the same configuration options.

Parameter name	Parameter value	Validator Type
(Preset) Sensitivity	(Wide) 1.2 (Default) 2 (Narrow) 3.2 (Custom) Positive floating value	All
Decision Bounds	Upper and Lower Upper Lower	All
Adaption Rate	Fast Slow	All

Sensitivity

Sensitivity defines the accepted range of values for the dynamic threshold.

Higher sensitivity (lower threshold)–Means that the accepted range of values is more narrow, and the model will identify more data quality incidents or anomalies, leading to more alerts. Higher sensitivity is best suited for your most important tables.
Lower sensitivity (higher threshold)–Implies a wider range of accepted values, resulting in fewer incidents and alerts. Lower sensitivity is ideal for less important tables that have historically produced noisy incidents.

Setting the right sensitivity is often an iterative process to find a balance between false positives and alert fatigue versus false negatives and missing real errors. The typical starting sensitivity value for testing is between 2 and 3. The default sensitivity in Validio is 2.0.

The following table maps the numeric value of Validio sensitivity presets to standard deviations:

Sensitivity Preset Options	Validio Sensitivity Values	Standard Deviations
Narrow	3.2	2.5
Default	2.0	4
Wide	1.2	5.5

Decision Bounds

The decision bounds type on the dynamic threshold specifies whether the boundaries for anomaly detection are double or single-sided:

Upper and lower–Detects both upper and lower anomalies.
Upper–Treats only upward deviations as anomalies. For example, this is the default for freshness validators. You do not want to be alerted about too fresh data but rather when your data is late.
Lower–Treats only downward deviations as anomalies.

Adaption Rate

You can configure how quickly the dynamic threshold adapts to changes in your data. This is controlled by the adaption rate parameter, which offers two settings:

Fast–The dynamic threshold model responds rapidly to shifts in your data's behavior. This means that if a new trend or pattern emerges, the bounds within which data is considered "normal" will adjust quickly. This is useful if you have that your data patterns change often.
Slow–The dynamic threshold model to adjust its bounds more gradually. The model will still adapt to changes, but it will do so over a longer period, giving more weight to historical data. This setting is ideal for more stable data and where the user may not want rapid changes of the bounds but still alert on large outliers.

👍
Backfill after changing the adaption rate
For an existing validator, changing the adaption rate from Fast to Slow and vice-versa without a backfill results in a discontinuity in dynamic threshold bounds on the first window after the change is applied. For a consistent behavior on historical data, we recommend that you either backfill the source after changing the adaption rate, or duplicate the validator and backfill the new one.

Adaption Rate Examples

The following example illustrates bounds of dynamic threshold with different adaption rates. Bounds for Fast widen and narrow more rapidly with changes in data, while the bounds for Slow are more persistent in following the downward trend and triggering alarms when the trend changes.

Use Cases and Recommendations

The choice between Fast and Slow adaption depends on the characteristics of your data and your monitoring goals.

Use Case	Recommended Setting	Explanation
Volatile Metrics: Metrics that frequently experience significant, genuine shifts (such as stock prices and social media trends).	`Fast`	You want the threshold to quickly adapt to new trends and avoid prolonged periods of false positives. The bounds "catch up" and react quickly to recent variations in data.
Stable Metrics with Occasional Spikes: Metrics that are generally stable but may have short-term, anomalous spikes (such as conversion rates and error rates).	`Slow`	You want to minimize the impact of temporary deviations. Slow adaption rate makes Dynamic Threshold less sensitive to small variations in data (noise), while still catching large spikes and deviations.
Metrics with Changing Trends: Metrics that exhibit slow, consistent changes over time (such as user growth and monthly sales).	`Slow`	The Slow setting puts more weight on historical data which makes the bounds more persistent in following a trend. This is useful to alert on changes in trends and slow moving metrics.
New Data Sources: When monitoring a new data source with an unknown pattern.	`Fast`	For new data where you want to avoid too many incidents initially. The Fast adaptation rate allows the bounds to react and adjust quickly, making alerts less frequent.

📘
Default adaption behavior
The Fastsetting represents the previous default behavior of Validio's dynamic thresholds. If you have existing validators using dynamic thresholds and do not explicitly set the adaption rate, they will continue to behave as before (with Fast adaption) until you manually change the adaption rate.

Model Feedback and Retraining

Model retraining with false negative feedback (and seasonality)

You can help improve anomaly detection on dynamic thresholds by giving feedback on both false positives (incorrectly flagged incidents) and false negatives (missed incidents). This feedback is used to retrain the dynamic threshold model for specific segments, making it more precise over time. This targeted retraining helps minimise alert fatigue and ensures the model catches subtle, context-specific anomalies.

Dynamic thresholds will only use feedback that shares a similar context to the current data. This means that it will exclude information that does not seem relevant any more. For example, if you are validating your conversion rate and you change the definition inside your warehouse such that the scale of the metric increases by a factor of 100, then DT will not use the model feedback that was previously given to it.

False Positive Feedback: Widening the Bounds

When the model incorrectly flags a data point as an anomaly, you can mark it as a False Positive. This feedback tells the model that it was too sensitive in that specific context.

To provide this feedback, simply change the triage state of the detected incident to "False Positive". The model learns from this input and becomes less likely to flag future data points that have a similar value in a similar context. This action effectively widens the decision bounds for that pattern, reducing noise and alert fatigue.

For example, as shown in the figure for the "TOTAL_SALES_AMOUNT" Validator, three data points exceeded the dynamic threshold. By marking these incidents as “False Positive,” the model will learn from these cases, adjusting the bounds so that values in a similar context and magnitude are less likely to be flagged in the future.

📘
False Positive and Retraining
Changing the incident status to False Positive will always retrain the model. If you want to mark the incident as false positive but not give feedback, change the status to Resolved and write an incident Comment explaining that it is a false positive. For more information, see Comments on Incidents.

False Negative Feedback: Tightening the Bounds

A false negative occurs when a data point is a genuine anomaly, but the system did not create an incident for it. This can happen if the dynamic threshold bounds have become too wide, perhaps after a period of high variance.

To provide this feedback, you can manually select a non incident data point in the graph and mark it as False negative with the toggle. This action tells the model that it was not sensitive enough. The model uses this feedback to tighten its decision bounds more quickly for that specific data pattern, improving its ability to catch similar anomalies in the future. This is especially effective for recalibrating bounds that have "blown up" or become excessively wide. Make sure you only mark data points as False Negative that are true missed anomalies.

Marking a non-incident data point as False negative

Similar data patterns are now detected as anomalies

Without feedback, the data points are not detected as incidents

Reverting Feedback

You can revert (undo) both False Positive and False Negative feedback. In the Validator graph, x marks data points for which feedback was given.

To undo False Negative feedback click on the data point in the graph and use the toggle to switch it back.
To undo the False Positive feedback change the incident status from False Positive to any other status (including Triage).

Model Feedback versus Sensitivity

While both model feedback and sensitivity settings adjust the model's alerting behaviour, they serve different purposes and operate at different scopes. Use sensitivity for broad, initial calibration and model feedback for targeted, ongoing adjustments.

Feature	Sensitivity	Model Feedback
Scope	Global: Applies to all segments monitored by the Validator.	Per-Segment: Applies only to the specific segment where feedback was given.
Primary Use Case	Initial Calibration: Used to set the overall tolerance for false positives vs. false negatives when first configuring a validator.	Targeted Fine-Tuning: Used to correct specific incorrect (FP) or missed (FN) detections on an ongoing basis.
Effect on Bounds	Adjusts the overall width of the threshold bounds.	Performs a one-sided adjustment which can be used to only adjust the upper or lower bound.

📘
Progressive Changes and Improvements
Changes appear progressively as more feedback is provided. Each piece of feedback makes the threshold more attuned to the specific patterns in your data.
Model feedback will also be used to improve the underlying algorithms that power the Dynamic Threshold in future releases of Validio.

Seasonality Detection

Dynamic thresholds can automatically adapt to seasonality patterns that appear in your data which is related to the calendar. You do not have to enable or configure this feature. When there is enough evidence in your data to support the pattern detection, the dynamic threshold will adapt and not trigger an incident if it is caused by the seasonality.

Calendric Seasonality: Seasonal patterns can appear in your data due to the calendar. Calendric seasonality can relate to business processes and cycles where work may be planned and reviewed in regular cycles that may be weekly, bi-weekly, or monthly, and this behavior is reflected in your data. One example of calendric seasonality is recognizing that a Volume validator returns 0 on all days except the days when the pipeline runs and ingests data.

Metric Support

Dynamic thresholds include functionality to estimate the support of the metric by partitioning the sample space (where data can appear) into negative values, zeros, and positive values. Depending on the frequency of the support, the metric gets an estimated positive, non-negative, or unbounded support. The estimated support is not static--it can change over time.

Depending on the estimated support, the lower decision bound is adapted:

If the estimated support is positive, values which are zero or negative are considered incidents.
If the estimated support is non-negative, negative values are considered incidents.

Level-shift Detection

Dynamic threshold incorporates advanced level-shift detection to identify and adapt to abrupt and persistent changes in your data's values. A "level shift" often indicates a systemic event, like a configuration change or feature rollout, that causes the data's baseline to permanently move. This capability ensures you get alerted when such sudden changes occur, and that the anomaly detection remains accurate after the change.

This system works by analyzing patterns of consecutive incidents. If a sustained deviation suggests a new stable baseline, statistical tests confirm the shift. Once confirmed, the dynamic threshold intelligently recalibrates by adjusting its training data to this new normal, preventing future false positives based on the old baseline and ensuring continued reliability for evolving data patterns. This adaptive process applies iteratively, ensuring accurate tracking even through multiple sequential level shifts and maintaining reliability as your data evolves.

Dynamic threshold also has the ability to make the transition between two levels, or baselines, if it deems that the new level is a reversion to an old one. That is, if dynamic threshold has recently seen one or more level shifts and there is a new incident group, it will try to connect the new incidents with the levels that was previously seen.

Common scenarios where Level-shift Detection is

Adapting to step changes in metrics, with improved handling of temporary shifts and reversions to prevent over-alerting.
Recognizing new baselines more consistently after feature rollouts or configuration updates.
Recurring level shifts that happen on a calendric seasonality (for example, 1st of month).
Identifying when data flatlines, becomes unusually noisy, or shows other significant changes in variance.