Concepts and Terminology

This page contains an overview of the key conceptual parts used across the Validio platform.

Key Concepts

ConceptDescription
CredentialsUsed to access and authenticate to data sources. One set of credentials can be used to set up multiple sources.
SourceConnects Validio to one source system, such as Data Warehouse, Data Stream, or an Object Storage. A Source is defined as one table in a Data Warehouse or a topic in a Data Stream. Segmentations, Windows, and Validators are defined for each Source.
ValidatorThe components responsible for monitoring and validating the data in your sources. Validio validates data using metrics calculated over a subset, or window, of the data in a Source, which fields to monitor, and what thresholds should be considered acceptable. Each source can have one or more validators.
WindowA Window can be defined as a time interval or bucket of data. There is also a Global window, which considers all data in the source. Each source must have at least one window, but several windows can be created for each source.
SegmentationAllows validation per segment, also referred to as group. You can think of this as a GROUP BY statement in SQL. Each Source has at least one segment. The default Segmentation is called Unsegmented.
Notification ruleCan be used to send incidents to specified channels, such as Slack. Each notification rule can include incidents from multiple Sources.
Notification channelEach notification rule has a notification channel attached. The same channel can be used for multiple notification rules.
LineageDescribes how data flows through a data stack, from its origin to its final use. For some source types Lineage is created automatically, based on a Credential, or a dbt Manifest file. For others, Lineage can be created manually, based on a Source.

Data Quality Score

The Data quality score is a measure of the data quality and will be calculated for each Segment, Validator, and Source. On the Overview page, the quality score is presented as a percentage that shows the overall data quality across all of your Sources, taking the average over a time range.

The quality score is calculated as a fraction of the number of incidents that you experienced during a selected time interval:

quality = 100 * (total_artifact_count - incident_count) / total_artifact_count

A quality score of 100% represents the case where no incidents have occurred during the time range. A quality score of 0% represents the case when all monitored metrics are causing incidents with high severity.

Validio assigns the following color scheme when the data quality is displayed in graphs and tables:

ColorData Quality Score
GreenScore equal or above 90%
YellowScore between 60% and 90%
RedScore below 60%

Time to Resolution Metrics

The Time to resolution graph displays a summary of the current statuses of incidents and the incident resolution over time. The Time to resolution metric is calculated when you change the status of at least one incident to False Positive or Resolved. On days when you didn't update the status of any incidents, the points will show on the graph as an open dot, and the tooltip will display a "-" to indicate that the Time to resolution value is not available. The granularity of the graph depends on the time range settings of the view.

Reads and Writes Metrics

Validio monitors interaction with all configured sources and tracks the usage and performance as Reads and Writes. You can view these metrics on the Source Overview tab.

MetricDescription
ReadsThe number of times the source is accessed, for example the number of SELECT queries, in the last 30 days. (Table views will also count as reads.)
WritesThe number of times the source is modified in the last 30 days. This includes the following queries: CREATE, UPDATE, DELETE, PUT, INSERT, MERGE, TRUNCATE, and so on.