Key Concepts

The table below contains an overview of the key conceptual parts used across the Validio platform. Each of the concepts is designed to govern their specific logical parts and have their own configuration wizard for set-up and configuration.

📘

Map of the territory

We recommend using this page as a reference page rather than the one page to explain it all. Come back here if you’re deep into a specific detailed page and need a birds eye view again!

Key Concepts Overview

Concept categoryConcept nameWhat to be configuredDescription
ConnectorsSourcesAuthentication to read source and define features to ingestConnector to data source enabling ingestion of data to be monitored and validated
DestinationsAuthentication to write datapoint errors to a destinationConnector to data destination enabling real-time egress of data errors caught in datapoint pipelines
PipelinesDataset pipelinesDefine batch logic, partitions and apply notification rulesConfiguration of how a batch dataset should be defined, governing how dataset metrics are calculated.
Example: defining the dataset a mean should be calculated on

Define partitions, enabling metrics to be tracked on partitions of datasets.
Example: divide a dataset into partitions based on categorical ‘country’ feature, enabling e.g. mean calculation on ‘price’ feature by each country partition

Apply notification rules to a pipeline (created separately in the Notification wizard, see below)
Datapoint pipelinesDefine ingestion logic, partitions and apply notification rulesConfigure data pipeline ingestion logic - refer to the datapoint page for details

Define partitions, enabling metrics to be tracked on partitions of datasets

Apply notification rules to a pipeline (created separately in Notification wizard)
NotificationsNotificationsDefine rule and channel for notification of alertsDefine notification rules of how alerts are grouped before sending through the alerts to the channel. Exists to mitigate alert fatigue

Configure and set-up notification channels
Monitors and filtersMonitorsDefine and track dataset metrics on a specific featureA monitor calculates and tracks a specific dataset metric
Example: mean of ‘age’ feature

Monitors are only applicable for Dataset pipelines

Multiple monitors can be configured for one dataset pipeline, allowing for multiple metrics to be calculated and tracked
FiltersDefine filters evaluating individual datapoints on a specific featureEvaluate individual data points based on specified filter logic
Example: filter all datapoints with value between 0-18 in ‘age feature’

Filters are only applicable for datapoint pipelines

Multiple filters can be configured for one datapoint pipeline, allowing for multiple different filters
AlertsAlertsDefine logic for what should be considered erroneous data and be alerted forMonitors: defines when a calculated metric a monitor tracks should be defined as erroneous data and alerted for
Example: alert when min value of ‘price’ feature in a dataset is under 0

Filters: define logic of when datapoints passing (or not passing) a filter should be alerted for
Example: alert when more than 10 datapoints ingested in the last micro-batch did not pass the filter

All alerts are visible in the UI, and will be grouped according to notification rules before sending it through the notification channel

Key Concepts relationships

The below diagram depicts the relationship between the different key concepts. Note that it is not a diagram over the services running Validio, but rather an illustration and a mental model of the key concepts for the user.

Concept cardinality

  1. Each Source Connector is connected to one external 3rd party integration (DWH, object store, streaming source). One Source is defined as one table in a DWH, one topic in a stream (or equivalent) or a bucket (folder) in an object store. The same 3rd party integration can have multiple Source Connectors, e.g. for accessing multiple tables in the same DWH.

  2. Each Destination Connector is connected to one specific destination source. One source is defined the same way here as in Source Connector. Only Datapoint pipelines supports Destination Connectors

  3. A Dataset pipeline is connected to one specific Source Connector only, i.e. 1:1 relation to a table in a DWH, topic (or equivalent) in a stream, or a bucket (folder) in an object store. One Source connector can have multiple Dataset pipelines.

  4. A Datapoint pipeline is connected to one specific Source Connector only, i.e. 1:1 relation to table in DWH, topic (or equivalent) in a stream, or a file in object store.

  5. A Notification rule can be used across multiple pipelines. However, each pipeline can only have one instance of a Notification rule. Note that this rule is optional

  6. A Monitor is connected to a specific Dataset pipeline. One Dataset pipeline can have multiple monitors to track multiple metrics and features

  7. A Filter is connected to a specific Datapoint pipeline. One Datapoint pipeline can have multiple Filters to track multiple metrics and features

  8. An Alert pertains to either a specific Monitor or a specific Filter. A Monitor and Filter can have multiple Alerts, although it rarely warrants more than one