Smart

Filter individual outlier datapoints based on empirical data

Configuration parameters

Parameter name and description Parameter values
1. Name Arbitrary String
2. Target feature List of source features with numeric data types
3. Sensitivity Positive float number
4. Smoothing Positive integer >= 1
5. Minimum absolute difference Positive float >= 0
6. Minimum relative difference Positive float >= 0 (%)
7. Computed metric
  • Passing
  • Failing
  • Passing percentage
  • Failing percentage
  • Total

Parameter details

For each sessionized datapoint batch, a smart filter will compare all of the new datapoints to a modeled empirical distribution based on the most recent batches. How much empirical data is taken into account in the modeled distribution is governed by the smoothing parameter, while the sensitivity parameter controls the bounds for what should be considered an anomaly.

👍

Setting the right parameter values is often an iterative process in the beginning to balance false positives and alert fatigue vs. false negatives and missing out on real errors

Sensitivity

A higher value causes more datapoints to be labeled as anomalies. For example, a sensitivity value of 3 will label all datapoints beyond 3𝛔 from the mean as anomalies, where 𝛔 is the estimated standard deviation. The sensitivity value is inversely proportional to the bounds outside which datapoints are considered anomalies:

Anomaly bound = (3/X)*3𝛔 where X is the sensitivity value

E.g. a score of 5 would give (3/5)*3𝛔 = 1.8𝛔, i.e. all datapoints beyond 1.8𝛔 would be considered anomalies.

Typical starting values to test are 2-3.

Smoothing

The smoothing parameter governs how much of the historical data will be taken into account when modeling the empirical distribution. Heuristically, the value of the smoothing parameter can roughly be thought of as the number of historic sessionized batches taken into account.

Choose a lower smoothing value if you know that your data is prone to some distribution shifts, higher smoothing values if you expect your distribution to be fairly stable.

Typical starting values to test are 5-10.

Minimum absolute difference

The minimum absolute difference between the feature value and the mean of the reference distribution for the point to be considered an outlier/anomaly.

E.g. if set to '10', the difference between the mean of the reference distribution and the point being validated needs to be greater than 10, or less than -10, and be outside the bounds the smart filter sets to be considered an anomaly. This is essentially an "ignore any alerts within the difference' parameter

Minimum relative difference

Minimum difference for points to be considered an anomaly expressed in relative terms, dividing 'absolute difference' with the absolute of the mean of the reference data.

E.g. if the mean of the reference distribution is 10, and user sets 10% as parameter value. Data points falling between 9 and 11 will not be considered anomalies.

Use this option instead of 'Minimum absolute difference' when you care more about the relative difference to the reference mean than the absolute difference.