Key Concepts
The table below contains an overview of the key conceptual parts used across the Validio platform. Each of the concepts is designed to govern their specific logical parts and have their own configuration wizard for set-up and configuration.
Map of the territory
We recommend using this page as a reference page rather than the one page to explain it all. Come back here if you’re deep into a specific detailed page and need a birds eye view again!
Key Concepts Overview
Concept category | Concept name | What to be configured | Description |
---|---|---|---|
Connectors | Sources | Authentication to read source and define features to ingest | Connector to data source enabling ingestion of data to be monitored and validated |
Destinations | Authentication to write datapoint errors to a destination | Connector to data destination enabling real-time egress of data errors caught in datapoint pipelines | |
Pipelines | Dataset pipelines | Define batch logic, partitions and apply notification rules | Configuration of how a batch dataset should be defined, governing how dataset metrics are calculated. Example: defining the dataset a mean should be calculated on Define partitions, enabling metrics to be tracked on partitions of datasets. Example: divide a dataset into partitions based on categorical ‘country’ feature, enabling e.g. mean calculation on ‘price’ feature by each country partition Apply notification rules to a pipeline (created separately in the Notification wizard, see below) |
Datapoint pipelines | Define ingestion logic, partitions and apply notification rules | Configure data pipeline ingestion logic - refer to the datapoint page for details Define partitions, enabling metrics to be tracked on partitions of datasets Apply notification rules to a pipeline (created separately in Notification wizard) | |
Notifications | Notifications | Define rule and channel for notification of alerts | Define notification rules of how alerts are grouped before sending through the alerts to the channel. Exists to mitigate alert fatigue Configure and set-up notification channels |
Monitors and filters | Monitors | Define and track dataset metrics on a specific feature | A monitor calculates and tracks a specific dataset metric Example: mean of ‘age’ feature Monitors are only applicable for Dataset pipelines Multiple monitors can be configured for one dataset pipeline, allowing for multiple metrics to be calculated and tracked |
Filters | Define filters evaluating individual datapoints on a specific feature | Evaluate individual data points based on specified filter logic Example: filter all datapoints with value between 0-18 in ‘age feature’ Filters are only applicable for datapoint pipelines Multiple filters can be configured for one datapoint pipeline, allowing for multiple different filters | |
Alerts | Alerts | Define logic for what should be considered erroneous data and be alerted for | Monitors: defines when a calculated metric a monitor tracks should be defined as erroneous data and alerted for Example: alert when min value of ‘price’ feature in a dataset is under 0 Filters: define logic of when datapoints passing (or not passing) a filter should be alerted for Example: alert when more than 10 datapoints ingested in the last micro-batch did not pass the filter All alerts are visible in the UI, and will be grouped according to notification rules before sending it through the notification channel |
Key Concepts relationships
The below diagram depicts the relationship between the different key concepts. Note that it is not a diagram over the services running Validio, but rather an illustration and a mental model of the key concepts for the user.
Concept cardinality
-
Each Source Connector is connected to one external 3rd party integration (DWH, object store, streaming source). One Source is defined as one table in a DWH, one topic in a stream (or equivalent) or a bucket (folder) in an object store. The same 3rd party integration can have multiple Source Connectors, e.g. for accessing multiple tables in the same DWH.
-
Each Destination Connector is connected to one specific destination source. One source is defined the same way here as in Source Connector. Only Datapoint pipelines supports Destination Connectors
-
A Dataset pipeline is connected to one specific Source Connector only, i.e. 1:1 relation to a table in a DWH, topic (or equivalent) in a stream, or a bucket (folder) in an object store. One Source connector can have multiple Dataset pipelines.
-
A Datapoint pipeline is connected to one specific Source Connector only, i.e. 1:1 relation to table in DWH, topic (or equivalent) in a stream, or a file in object store.
-
A Notification rule can be used across multiple pipelines. However, each pipeline can only have one instance of a Notification rule. Note that this rule is optional
-
A Monitor is connected to a specific Dataset pipeline. One Dataset pipeline can have multiple monitors to track multiple metrics and features
-
A Filter is connected to a specific Datapoint pipeline. One Datapoint pipeline can have multiple Filters to track multiple metrics and features
-
An Alert pertains to either a specific Monitor or a specific Filter. A Monitor and Filter can have multiple Alerts, although it rarely warrants more than one
Updated 5 months ago