About Data Lineage

Validio Data Lineage allows you to track and visualize how data flows through your data stack from its origin to its final use. This information can help simplify how you manage, triage, and troubleshoot data quality issues across all of your sources in Validio.

Validio Lineage page with asset details panel

Validio Lineage page with asset details panel

The data flow in Lineage is presented as a knowledge graph and displays your organization’s catalog assets and the relationships between these assets. For more information, see About the Lineage Graph.

Validio automatically infers lineage from the data sources you add. This process includes generating the catalog assets and deriving the relationships between datasets and fields. To generate the catalog assets, Validio reads from the Information Schema, or equivalent metadata. Then, Validio combines information from query logs and (if provided) dbt Manifest files to map the lineage relationships.

  • For data warehouse and query engine sources, the lineage process is automated after you complete the set up of a credential for either source type.
  • For other source types, such as object storage and data streams, catalog assets are only added after you add new sources.
  • You can manually define new lineage relationships between datasets and fields. For more information, see Using Lineage.

Validio runs daily jobs, for example SQL queries, to collect current information about lineage relationships from the Information Schema.

  • This means that a delay of maximum 24 hours can occur from when a new relationship is created until it is visible inside Validio.
  • To ensure that lineage relationships between table assets are up-to-date, Validio disregards SQL queries older than 30 days.

Lineage relationships from historical SQL queries may not be accurate because it is possible that Lineage information can not be derived from historical queries, and that some Lineage information from historical queries is no longer current. Lineage from a dbt Manifest JSON file is complete and current to the extent of the dbt implementation. In cases where a specific dataset exists both in query logs and in the dbt Manifest JSON file, Validio will merge the two lineages into one uniform lineage.

Note: Currently, you have to use the Validio CLI to upload the dbt Manifest JSON file. For more information, see Validio CLI.