Root Cause Analysis

Root Cause Analysis

Root Cause Analysis

Validio root cause analysis (RCA) automatically groups incidents that occur on validators around the same time. Using information about the validator and source configurations, the shared fields that are referenced between incident groups, and the upstream or downstream lineage dependencies, Validio finds the causal and correlational relationships among incidents within the source and across sources in the lineage graph. By grouping the incidents together, Validio also helps to minimize the number of alerts that are sent.

Validio combines detailed data lineage with data quality monitoring to provide you with an analysis that explains why the incident occurred, relates the incident to other incidents in your catalog, and describes how the incident might affect other catalog assets. You can view this analysis in the Incidents > Root Cause tab.

Prerequisites for RCA

Validio RCA does not require configuration. However, RCA will only work on sources that have validators. Without validators on the source, Validio cannot create incidents and track incident groups.

📘

Note

Although RCA works for all validator types, some validators, such as Freshness and Volume validators, can provide more causal context. For more information, see About Validators.

RCA Relationships

Whenever an incident occurs on a validator, Validio assigns it to an incident group, which is a collection of consecutive incidents for that specific validator. If the previous datapoint processed by the validator was an incident, then the current incident is assigned to the same group. Otherwise, the current incident is assigned to a new group.

Validio uses information (such as the source and validator configurations, field-level lineage, dbt manifests, and so on) from these incident groups to identify causal and correlational relationships among them. Because Validio understands how data flows through the system and what types of events can cause others, it can identify if an incident group is “caused by” or “correlates with” another incident group.

📘

Note

Incident groups can occur on different validators where the validators share something in common, such as a field that is being monitored by one validator and referenced in a filter used by another validator. Because of this shared field, for example, incidents are likely to correlate.

The following table lists the different RCA relationships with examples illustrating how Validio identifies them.

RCA RelationshipExample
A related Freshness validator incident in the same source.A row count validator throws an incident at the same time that a freshness validator doesn’t receive data.
A related Row Count validator in the same source.A row count validator that drops significantly and a mean validator tracking a numeric field that shifts at the same time.
A related validator referencing the same field .A numeric validator tracking the mean of a field and another numeric validator tracking the maximum of the same field.
A related validator referencing the upstream field .A validator tracking the mean of total_sales_amount on an upstream source and the mean of total_sales_amount on a source directly downstream.
A related validator referencing the downstream field .A validator tracking the mean of total_sales_amount on an downstream source and the mean of total_sales_amount on a source directly upstream.
A related Freshness validator incident on an upstream source.A freshness validator threw an incident upstream and likely caused the current freshness validator incident.
A related Freshness validator incident on a downstream source.Freshness validators downstream that are throwing incidents as a result of the current freshness validator incident.

For more information, see About Validator Incidents and Managing Incidents.