HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Investigating and Triaging Critical Incidents

This guide is part of the Getting Started Tutorial. For more information, see the Tutorial Overview.

In the previous guide, Setting up Monitoring and Validation, you configured your demo environment to monitor sources and added validators to each source. In this guide, you will investigate the incidents that the validators detected and triage critical incidents using Validio's automated Root Cause Analysis feature.

Investigate Incidents

An incident is a data quality issue captured by a Validator based on the thresholds configured on that validator. Incidents can have a severity of High, Medium, or Low. For more information, see About Validator Incidents.

To view incidents detected on the gold_daily_management_data source,

  1. Navigate to the source and Validators tab.

    The histogram displays incidents detected on the source in the last month. The table lists the validators, sorted by most recent and highest severity incident detected.

  2. Navigate to the Sum of "nr_of_crashes_daily" validator.
    You will see all the incidents detected on this validator during the last month.

  3. Review the incidents based on the country segmentation that you created for this validator.

📘

Note

You should always update the status of an incident after investigating and resolving it. Changing the status of an incident also helps to refine and retrain the anomaly detection model's accuracy. For more information, see Model Retraining and Managing Incidents.

Triage with Root Cause Analysis

Validio Root Cause Analysis (RCA) automatically groups incidents that occur around the same time on a validator to find the causal and correlational relationships among incidents within the source and across multiple sources. Validio RCA combines detailed data lineage with data quality monitoring to provide an analysis that explains why the incident occurred, relates the incident to other incidents in your catalog, and describes how the incident might affect other catalog assets.

📘

Note

Validio RCA will only work on sources that have validators. Otherwise, RCA does not require configuration to create incidents and track incident groups.

To triage critical incidents on the Sum of "nr_of_crashes_daily" validator:

  1. Click the menu on the top incident.

  2. Select View group details to open the incident group details Overview.
    You can use the Overview tab to debug the SQL query for the incident. Loading samples will generate a list of sample rows for the detected incidents. This view ranks the anomalies by their deviation from the expected values.

  3. Select the Root Cause tab, to review Validio's analysis of the incident group.
    Validio uses information, such as the source and validator configurations and field-level lineage, from the incident group to identify cause and correlational relationships. Because Validio understands how data flows through the system and what types of events can cause others, it can identify if an incident group is “caused by” or “correlates with” another incident group.

    The lineage map displays the end-to-end data flow with the current incident as the starting point. The relationships can help you to understand the upstream root cause and downstream impact of the incident.
    The table lists each of the incidents and provides a description of the RCA relationship to help explain the root cause of the incident.

For more information, see Root Cause Analysis.


What’s Next

In this guide, learned about investigating incidents that Validio detects and how to triage critical incidents with Validios automated Root Cause Analysis feature. Next you will set up notification rules and channels alert on future incidents.