About Data Lineage

Validio Data Lineage tracks and visualizes how data flows through your data stack, helping you triage and troubleshoot data quality issues across upstream and downstream sources.

Validio Data Lineage allows you to track and visualize how data flows through your data stack from its origin to its final use. This information can help simplify how you manage, triage, and troubleshoot data quality issues across all of your sources in Validio.

Lineage page with graph

The lineage graph presents data flow from left to right, with upstream entities to the left and downstream entities to the right. You start by selecting one or more anchor assets or fields as entry points for exploration, then progressively expand the graph to discover upstream and downstream connections. Multiple anchors let you approach the lineage from different perspectives — for example, starting from both a source table and a downstream report to understand how they connect.

Key capabilities of Validio Data Lineage include:

  • Asset and field-level lineage -- Visualize relationships at both the dataset and individual field level.
  • Progressive exploration -- Start from anchor points and expand connections on demand. The graph loads a portion of the lineage initially and fetches additional connections as you expand nodes, so large lineages remain navigable.
  • Glossary term integration -- View and manage business glossary terms directly on lineage nodes and fields.
  • Data quality overlays -- See incident severity indicators on nodes to quickly identify quality issues.
  • Impact analysis -- Overlay incidents on the graph to trace how issues propagate downstream, complementing root cause analysis for upstream tracing.
  • Saved searches -- Save anchor and filter combinations to revisit lineage views. See Customizing and Saving Views.
  • Focus mode -- Isolate lineage paths connected to a selected node or field.

For more information about the graph structure, see About the Lineage Graph. For instructions on navigating and using the lineage page, see Using Lineage.

How Lineage Is Generated

Validio automatically infers lineage from the data sources you add. This process includes generating the catalog assets and deriving the relationships between datasets and fields. To generate the catalog assets, Validio reads from the Information Schema, or equivalent metadata. Then, Validio combines information from query logs and (if provided) dbt Manifest files to map the lineage relationships.

  • For data warehouse and query engine sources, the lineage process is automated after you complete the set up of a credential for either source type.
  • For other source types, such as object storage and data streams, catalog assets are only added after you add new sources.
  • You can manually define new lineage relationships between datasets and fields. For more information, see Using Lineage.

Validio runs daily jobs, for example SQL queries, to collect current information about lineage relationships from the Information Schema.

  • This means that a delay of maximum 24 hours can occur from when a new relationship is created until it is visible inside Validio.
  • To ensure that lineage relationships between table assets are up-to-date, Validio disregards SQL queries older than 30 days.
📘

dbt Manifest JSON File

Currently, you have to use the Validio CLI to upload the dbt Manifest JSON file. For more information, see Validio CLI.