Managing Incidents
You can review and manage individual incidents from its Group Details page. The details page for each incident group contains the following tabs:
- Overview–Provides a comprehensive summary of the incident, including the current status and owner, with a graph showing the validator metric values over time.
- Root Cause–Provides an analysis of the incident, with detailed lineage to show where the incident occurred, its likely upstream cause and downstream impacts, to help you troubleshoot and resolve the incident.
- Past Groups–Provides a list of past occurrences of similar incident groups, to give context on how often the same incident has been seen and whether it happens at a regular frequency. You can also use this tab to perform batch operations on all similar incident groups.
Note
If you have notification rules to track when an incident occurs, the notification includes a link directly to the Incident details page where you can manage it.
Group Overview Tab
The Group details > Overview tab provides a comprehensive summary of the incident with a graph of the validator metric values over time and a table of the individual incidents.
The group summary includes the following information:
Field | Description |
---|---|
Status | The most urgent status for the incidents in the group: Triage, Investigating, and Resolved, False Positive. |
Priority | The priority (High, Medium, Low) is automatically determined based on the severity of the incidents in the group and how long the incident has been ongoing. |
Owner | The username of any owner assigned to the incident. |
First Seen | The date when the first incident in the group occurred. |
Last Seen | The date when the last incident in the group occurred. |
Source | The source where the incident occurred. You can click on the source to navigate to its details page. |
Validator | The validator and metric that captured the incident. You can click on the validator to navigate to its details page. |
Metric Graph
The metric graph displays a history of the field values tracked by the validator. You can see when the incident occurred and the values before and after the incident.
The graph includes information about the severity of the incidents (High, Medium, Low) and a count of the occurrences of each severity. When you hover on a datapoint in the graph, a tooltip will display the time that the incident occurred, its Value, and its Upper and Lower boundaries.
Incident Table
The incident table lists the individual incidents in the group and includes the following information:
Column Name | Description |
---|---|
Value | The value of the validator metric that caused the incident. |
Deviation | The prominence of the incident, defined as the difference between Value and the breached boundary. |
Status | The progress of the incident resolution: Triage, Investigating, and Resolved, and False Positive. |
Severity | The severity of the incident: High, Medium, Low. |
Seen At | Relative time when the incident was seen. |
Reported At | Relative time when the incident was reported. |
Update the Incident Status
You can use the incident status to track the progress of the incident resolution and retrain the anomaly detection algorithms.
To change the status of an incident,
- Check the box for each incident you want to update.
- Click Update Status.
- Select the new status to apply to all the selected incidents.
The following table lists the available status options:
Status | Description |
---|---|
Triage | The default for new incidents and indicates that it requires review. |
Investigating | The incident is currently being addressed. |
Resolved | The incident has been resolved. |
False Positive | The incident has been addressed and is not an anomaly. |
Note
Changing the status of a detected incident to False Positive, provides feedback to retrain the anomaly detection algorithms so that it is less likely to wrongly detect similar data points as incidents when they occur in the future. This feedback cannot be undone. For more information, see Model Retraining.
Debug an Incident
You can use the Debug button to find information to help you troubleshoot the incident. The information that you see depends on the type of source. Debug is not available for all source types.
Data Warehouses and Query Engine Sources
When you debug an incident from a data warehouse or query engine source, you will use an automatically generated SQL query that,
- Filters data down to the specific Window and Segment where the incident occurred.
- Orders records based on how far away they are from being allowed, returning the most prominent outliers at the top.
Note
Although outliers are not the only cause of incidents, they are a good starting point for troubleshooting.
You can use Load Samples to upload sample data and view the query results in the Debug window. Or, you can copy the query text into your preferred developer environment and modify it as needed.
Note
SQL query is only available for debugging sources where Validio uses SQL pushdown for validation, which is common for data warehouse and query engines.
Object Storage Sources
When you debug an incident from an object storage source, you will see a section called Bucket, which contains information about the bucket or files where the incident was found.You can navigate to, download, or copy the link to the files for troubleshooting.
Root Cause Tab
The Root Cause tab provides an analysis of the current incident group to help you troubleshoot and resolve the incident. Root cause uses data lineage to trace where the incident occurs, what causes it, and its impacts on related upstream and downstream assets. Validio uses information from these incident groups to identify causal and correlational relationships among them.
For more information, see Root Cause Analysis.
Updated 6 months ago