Lineage describes how data flows through the pipelines in a data stack, from its origin to its final use. Lineage is often seen as a map, or graph, of an organization's data assets and the relations between these assets.
A Lineage graph has two fundamental components:
- Catalog assets - Datasets, along with their fields, are referred to as catalog assets. A catalog asset is represented as a node in a Lineage graph.
- Edges - Edges are the connections between fields and datasets in a Lineage graph.
Validio supports field-level lineage as well as dataset-level lineage.
- Field-level Lineage provides information about how fields relate to other fields and datasets. For example, that data in the column
date, in the table
gold__daily_sales_summary, is generated based on data in the column
created_at, in the table
- Dataset-level Lineage, on the other hand, only provides information about how datasets relate to other datasets. For example, that data in the table
gold__daily_sales_summaryis generated based on data in the table
silver_sales, without providing information about which columns are involved.
Linage is automatically added when you complete the set up of a Credential to a Data Warehouse or Query Engine. Validio queries the Information Schema, or equivalent, to derive information about data assets and edges. Validio can also derive information from a dbt Manifest JSON file.
Automatic Lineage is currently only available for Data Warehouses and Query Engines.
For more information, refer to Automatic Lineage details
You have the ability to manually add Lineage information to Validio. You can both add catalog assets and edges:
- Adding catalog assets - By adding a Source you also add the same dataset, with its fields, as a catalog assets to the Lineage graph. Note that this is not relevant for Data Warehouses and Query Engines, since all their datasets and fields are already added automatically.
- Adding edges - It is also possible to manually add lineage edges between existing catalog assets. This can, for example, be useful between catalog assets from different data sources.
For Data Warehouses and Query Engines, Validio's lineage graph displays all accessible datasets as catalog assets. By clicking a catalog asset, it can be converted to a Source, enabling it for further validation.
Updated 13 days ago