Key Concepts

The table below contains an overview of the key conceptual parts used across the Validio platform. Each of the concepts is designed to govern their specific logical parts and have their own configuration wizard for set-up and configuration.

📘

Map of the territory

We recommend using this page as a reference page rather than the one page to explain it all. Come back here if you’re deep into a specific detailed page and need a birds eye view again!

Key Concepts Overview

Concept category

Concept name

What to be configured

Description

Connectors

Sources

Authentication to read source and define features to ingest

Connector to data source enabling ingestion of data to be monitored and validated

Destinations

Authentication to write datapoint errors to a destination

Connector to data destination enabling real-time egress of data errors caught in datapoint pipelines

Pipelines

Dataset pipelines

Define batch logic, partitions and apply notification rules

Configuration of how a batch dataset should be defined, governing how dataset metrics are calculated.
Example: defining the dataset a mean should be calculated on

Define partitions, enabling metrics to be tracked on partitions of datasets.
Example: divide a dataset into partitions based on categorical ‘country’ feature, enabling e.g. mean calculation on ‘price’ feature by each country partition

Apply notification rules to a pipeline (created separately in the Notification wizard, see below)

Datapoint pipelines

Define ingestion logic, partitions and apply notification rules

Configure data pipeline ingestion logic - refer to the datapoint page for details

Define partitions, enabling metrics to be tracked on partitions of datasets

Apply notification rules to a pipeline (created separately in Notification wizard)

Notifications

Notifications

Define rule and channel for notification of alerts

Define notification rules of how alerts are grouped before sending through the alerts to the channel. Exists to mitigate alert fatigue

Configure and set-up notification channels

Monitors and filters

Monitors

Define and track dataset metrics on a specific feature

A monitor calculates and tracks a specific dataset metric
Example: mean of ‘age’ feature

Monitors are only applicable for Dataset pipelines

Multiple monitors can be configured for one dataset pipeline, allowing for multiple metrics to be calculated and tracked

Filters

Define filters evaluating individual datapoints on a specific feature

Evaluate individual data points based on specified filter logic
Example: filter all datapoints with value between 0-18 in ‘age feature’

Filters are only applicable for datapoint pipelines

Multiple filters can be configured for one datapoint pipeline, allowing for multiple different filters

Alerts

Alerts

Define logic for what should be considered erroneous data and be alerted for

Monitors: defines when a calculated metric a monitor tracks should be defined as erroneous data and alerted for
Example: alert when min value of ‘price’ feature in a dataset is under 0

Filters: define logic of when datapoints passing (or not passing) a filter should be alerted for
Example: alert when more than 10 datapoints ingested in the last micro-batch did not pass the filter

All alerts are visible in the UI, and will be grouped according to notification rules before sending it through the notification channel

Key Concepts relationships

The below diagram depicts the relationship between the different key concepts. Note that it is not a diagram over the services running Validio, but rather an illustration and a mental model of the key concepts for the user.

Concept cardinality

  1. Each Source Connector is connected to one external 3rd party integration (DWH, object store, streaming source). One Source is defined as one table in a DWH, one topic in a stream (or equivalent) or a bucket (folder) in an object store. The same 3rd party integration can have multiple Source Connectors, e.g. for accessing multiple tables in the same DWH.

  2. Each Destination Connector is connected to one specific destination source. One source is defined the same way here as in Source Connector. Only Datapoint pipelines supports Destination Connectors

  3. A Dataset pipeline is connected to one specific Source Connector only, i.e. 1:1 relation to a table in a DWH, topic (or equivalent) in a stream, or a bucket (folder) in an object store. One Source connector can have multiple Dataset pipelines.

  4. A Datapoint pipeline is connected to one specific Source Connector only, i.e. 1:1 relation to table in DWH, topic (or equivalent) in a stream, or a file in object store.

  5. A Notification rule can be used across multiple pipelines. However, each pipeline can only have one instance of a Notification rule. Note that this rule is optional

  6. A Monitor is connected to a specific Dataset pipeline. One Dataset pipeline can have multiple monitors to track multiple metrics and features

  7. A Filter is connected to a specific Datapoint pipeline. One Datapoint pipeline can have multiple Filters to track multiple metrics and features

  8. An Alert pertains to either a specific Monitor or a specific Filter. A Monitor and Filter can have multiple Alerts, although it rarely warrants more than one


Did this page help you?