Managing Incidents

You can review and manage individual incidents from its Group Details page. The details page for each incident group contains the following tabs:

  • Overview–Provides a comprehensive summary of the incident, including the current status and owner, with a graph showing the validator metric values over time.
  • Root Cause–Provides an analysis of the incident, with detailed lineage to show where the incident occurred, its likely upstream cause and downstream impacts, to help you troubleshoot and resolve the incident.
  • Past Groups–Provides a list of past occurrences of similar incident groups, to give context on how often the same incident has been seen and whether it happens at a regular frequency. You can also use this tab to perform batch operations on all similar incident groups.

📘

Note

If you have notification rules to track when an incident occurs, the notification includes a link directly to the Incident details page where you can manage it.

Group Overview Tab

The Group details > Overview tab provides a comprehensive summary of the incident with a graph of the validator metric values over time and a table of the individual incidents.

The group summary includes the following information:

FieldDescription
StatusThe most urgent status for the incidents in the group: Triage, Investigating, and Resolved, False Positive.
PriorityThe priority (High, Medium, Low) is automatically determined based on the severity of the incidents in the group and how long the incident has been ongoing.
OwnerThe username of any owner assigned to the incident.
First SeenThe date when the first incident in the group occurred.
Last SeenThe date when the last incident in the group occurred.
SourceThe source where the incident occurred. You can click on the source to navigate to its details page.
ValidatorThe validator and metric that captured the incident. You can click on the validator to navigate to its details page.

Metric Graph

The metric graph displays a history of the field values tracked by the validator. You can see when the incident occurred and the values before and after the incident.

The graph includes information about the severity of the incidents (High, Medium, Low) and a count of the occurrences of each severity. When you hover on a datapoint in the graph, a tooltip will display the time that the incident occurred, its Value, and its Upper and Lower boundaries.

Incident Table

The incident table lists the individual incidents in the group and includes the following information:

Column NameDescription
ValueThe value of the validator metric that caused the incident.
DeviationThe prominence of the incident, defined as the difference between Value and the breached boundary.
StatusThe progress of the incident resolution: Triage, Investigating, and Resolved, and False Positive.
SeverityThe severity of the incident: High, Medium, Low.
Seen AtRelative time when the incident was seen.
Reported AtRelative time when the incident was reported.

Update the Incident Status

You can use the incident status to track the progress of the incident resolution and retrain the anomaly detection algorithms.

To change the status of an incident,

  1. Check the box for each incident you want to update.
  2. Click Update Status.
  3. Select the new status to apply to all the selected incidents.

The following table lists the available status options:

StatusDescription
TriageThe default for new incidents and indicates that it requires review.
InvestigatingThe incident is currently being addressed.
ResolvedThe incident has been resolved.
False PositiveThe incident has been addressed and is not an anomaly.

Debug an Incident

You can use the Debug button to find information to help you troubleshoot the incident. The information that you see depends on the type of source. Debug is not available for all source types.

Data Warehouses and Query Engine Sources

When you debug an incident from a data warehouse or query engine source, you will use an automatically generated SQL query that,

  • Filters data down to the specific Window and Segment where the incident occurred.
  • Orders records based on how far away they are from being allowed, returning the most prominent outliers at the top.

📘

Note

Although outliers are not the only cause of incidents, they are a good starting point for troubleshooting.

You can use Load Samples to upload sample data and view the query results in the Debug window. Or, you can copy the query text into your preferred developer environment and modify it as needed.

📘

Note

SQL query is only available for debugging sources where Validio uses SQL pushdown for validation, which is common for data warehouse and query engines.

Object Storage Sources

When you debug an incident from an object storage source, you will see a section called Bucket, which contains information about the bucket or files where the incident was found.You can navigate to, download, or copy the link to the files for troubleshooting.

Root Cause Tab

The Root Cause tab provides an analysis of the current incident group to help you troubleshoot and resolve the incident. Root cause uses data lineage to trace where the incident occurs, what causes it, and its impacts on related upstream and downstream assets. Validio uses information from these incident groups to identify causal and correlational relationships among them.

For more information, see Root Cause Analysis.