Intro to Lineage (BETA)
Lineage allows you to track and visualize how data flows through your data stack, simplifying the management, triaging, and troubleshooting of data quality issues.
Overview
Lineage describes how data flows through the pipelines in a data stack, from its origin to its final use. Lineage is often seen as a map, or graph, of an organization's data assets and the relations between these assets.
A Lineage graph has two fundamental components:
- Catalog assets - Datasets, along with their fields, are referred to as catalog assets. A catalog asset is represented as a node in a Lineage graph.
- Edges - Edges are the connections between fields and datasets in a Lineage graph.

Lineage graph
Multiple source types
Using Validio, you can manage lineage within, and across Data Warehouses, Query Engines, Object Storages and Data Streams.
Field-level granularity
Validio supports field-level lineage as well as dataset-level lineage.
- Field-level Lineage provides information about how fields relate to other fields and datasets. For example, that data in the column
date
, in the tablegold__daily_sales_summary
, is generated based on data in the columncreated_at
, in the tablesilver_sales
. - Dataset-level Lineage, on the other hand, only provides information about how datasets relate to other datasets. For example, that data in the table
gold__daily_sales_summary
is generated based on data in the tablesilver_sales
, without providing information about which columns are involved.
Automatic Lineage
Linage is automatically added when you complete the set up of a Credential to a Data Warehouse or Query Engine. Validio queries the Information Schema, or equivalent, to derive information about data assets and edges. Validio can also derive information from a dbt Manifest JSON file.
Automatic Lineage is currently only available for Data Warehouses and Query Engines.
For more information, refer to Automatic Lineage details
Manual Lineage
You have the ability to manually add Lineage information to Validio. You can both add catalog assets and edges:
- Adding catalog assets - By adding a Source you also add the same dataset, with its fields, as a catalog assets to the Lineage graph. Note that this is not relevant for Data Warehouses and Query Engines, since all their datasets and fields are already added automatically.
- Adding edges - It is also possible to manually add lineage edges between existing catalog assets. This can, for example, be useful between catalog assets from different data sources.
Create Source from catalog asset
For Data Warehouses and Query Engines, Validio's lineage graph displays all accessible datasets as catalog assets. By clicking a catalog asset, it can be converted to a Source, enabling it for further validation.
Updated 13 days ago