HomeDocumentationChangelog
HomeDemoContact
HomeDemoContact

Intro to Lineage (BETA)

Lineage allows you to track and visualize how data flows through your data stack, simplifying the management, triaging, and troubleshooting of data quality issues.

Overview

Lineage describes how data flows through the pipelines in a data stack, from its origin to its final use. Lineage is often seen as a map, or graph, of an organization's data assets and the relations between these assets.

A Lineage graph has two fundamental components:

  • Catalog assets - Datasets, along with their fields, are referred to as catalog assets. A catalog asset is represented as a node in a Lineage graph.
  • Edges - Edges are the connections between fields and datasets in a Lineage graph.
Lineage map

Lineage graph

Multiple source types

Using Validio, you can manage lineage within, and across Data Warehouses, Query Engines, Object Storages and Data Streams.

Field-level granularity

Validio supports field-level lineage as well as dataset-level lineage.

  • Field-level Lineage provides information about how fields relate to other fields and datasets. For example, that data in the column date, in the table gold__daily_sales_summary, is generated based on data in the column created_at, in the table silver_sales.
  • Dataset-level Lineage, on the other hand, only provides information about how datasets relate to other datasets. For example, that data in the table gold__daily_sales_summary is generated based on data in the table silver_sales, without providing information about which columns are involved.

Automatic Lineage

Linage is automatically added when you complete the set up of a Credential to a Data Warehouse or Query Engine. Validio queries the Information Schema, or equivalent, to derive information about data assets and edges. Validio can also derive information from a dbt Manifest JSON file.

📘

Automatic Lineage is currently only available for Data Warehouses and Query Engines.

📘

For more information, refer to Automatic Lineage details

Manual Lineage

You have the ability to manually add Lineage information to Validio. You can both add catalog assets and edges:

  • Adding catalog assets - By adding a Source you also add the same dataset, with its fields, as a catalog assets to the Lineage graph. Note that this is not relevant for Data Warehouses and Query Engines, since all their datasets and fields are already added automatically.
  • Adding edges - It is also possible to manually add lineage edges between existing catalog assets. This can, for example, be useful between catalog assets from different data sources.

Create Source from catalog asset

For Data Warehouses and Query Engines, Validio's lineage graph displays all accessible datasets as catalog assets. By clicking a catalog asset, it can be converted to a Source, enabling it for further validation.