Platform Terminology

This page defines the key terms and concepts used across the Validio platform, from data observability fundamentals to platform-specific terminology.

📘

Looking for the Business Glossary feature?

This page defines platform terminology. For the Business Glossary feature that lets you define and manage business terms for your data, see Business Glossary.

For an overview of how these concepts work together in the Validio platform, see Key Concepts.

Anchor

A starting asset or field for lineage graph exploration. Multiple anchors can be set to define the scope of a lineage view.

Assignment

The link between a glossary term and a catalog asset or schema field. A term can be assigned to multiple resources, and a resource can have multiple terms.

Backfill

The process of reprocessing historical data through validators after a reset, rerun, or initial start with backfill. During a backfill, Validio evaluates past data against the current validator configuration. Incidents detected during a backfill can optionally be auto-resolved via the Auto-resolve backfilled incidents workspace setting, preventing historical reprocessing from affecting the data quality score. See Configuring Global Settings.

Business Glossary

A centralized repository for business term definitions in Validio, enabling consistent data vocabulary across an organization. See Business Glossary.

Classification

A governance tag attached to a glossary term. Validio supports three classification types: Critical Data Element, Regulatory Scope, and Data Sensitivity. Classifications are managed under Catalog > Classifications. See Classifications.

Critical Data Element

Abbreviated CDE. A classification that marks a schema field as critical to a business or regulatory outcome. A CDE record carries an owner, a regulator, and a set of required data quality dimensions that fields tagged with the CDE must be covered for. See Classifications.

Data Observability

A state where an organization has full visibility into its data pipelines. This state enables data teams to improve data quality over time. Data Observability realized overnight, but rather a vision data teams should always strive towards and continuously come closer to.

Data Quality

The extent to which an organization’s data can be considered fit for its intended purpose. Data quality can be considered along five dimensions: freshness, volume, schema, (lack of) anomalous values, and distribution. It is always relative to the data’s specific business context.

Data Quality Dimension

A DAMA-aligned governance label for what kind of data quality problem a validator looks for. Validio uses six dimensions: Completeness, Validity, Timeliness, Accuracy, Consistency, and Uniqueness. Each validator can declare a single dimension and a list of data quality fields it covers. See Data Quality Dimensions.

Data Quality Fields

The schema fields a validator is intended to govern. Distinct from the technical fields the validator computes on — for example, a Freshness validator on a loaded_at timestamp may govern a different business field. See Data Quality Dimensions.

Data Quality Incidents

Data which does not fulfill the requirements of Data quality, and is not considered fit for its intended purpose. In the Validio platform, an incident is triggered when a metric value breaches a set threshold.

Data Sensitivity

A classification type that marks schema fields with a sensitivity level: Public, Internal, Confidential, or Restricted. See Classifications.

Datapoint

The actual value of a piece of data. For example, "FGK3". A datapoint can be both scalar ("5") and non-scalar values ("JSON blob").

Dataset

A defined container of records that have the same schema.

Deep Data Observability

Data observability is considered deep if it is comprehensive in terms of six dimensions: Data sources, data formats, data granularity, validator configuration, cadence, and user focus.

Deep Data Observability Platform

A system designed specifically to help an organization come closer to the state of deep data observability.

Domain

An organizational category for grouping glossary terms and catalog assets by business area (e.g., Finance, Marketing). A first-class entity in the catalog. See Domains.

Domain Lineage

A business-level view of the lineage graph where nodes are domains and edges are auto-derived from glossary term assignments on connected catalog assets and fields. See About Domain Lineage.

Field

A container for datapoints of the same kind. For example, a column in a database or a key in a JSON schema.

File-based

Each file, such as CSV, is its own window. This is mainly useful for data in object storages.

Filter

A rule that determines which raw data should be validated.

Fixed-batch

Group a number of (N) succeeding events into each window. For example, "100 events per window".

Glossary Term

A named business concept with description, aliases, domain, and owner. The primary entity in the Business Glossary.

Integration

A workspace-wide resource that stores the connection and credentials for an external service such as a Slack workspace, a Jira instance, or an SMTP server. Notification channels reference an integration to know how to reach the external service, while adding the destination details (for example, a specific Slack channel ID). Manage integrations at Workspace > Integrations. See Managing Integrations.

Not to be confused with data source integrations such as Snowflake or BigQuery — those are configured through Credentials and surfaced on the catalog as sources.

Propagation

Spreading glossary term assignments to connected entities (assets or fields) via lineage edges. Can go upstream or downstream. See Glossary Term Propagation.

Record

A collection of related fields treated as a unit, such as a row in a database table, or events in a stream.

Regulatory Scope

A classification type that marks schema fields as in scope for a specific regulation (e.g., GDPR, BCBS 239, SOX) without making the stronger criticality statement of a CDE. See Classifications.

Required Dimension

A data quality dimension declared on a Critical Data Element as mandatory coverage. Fields tagged with the CDE must have at least one validator declaring the matching dimension to be considered covered.

Rerun

A non-destructive re-execution of a validator from a specific window. Unlike a reset, a rerun preserves existing metric data and incident history, and records the new results alongside the original values. This maintains a complete audit trail showing the original failure and subsequent successful validation. See Rerunning a Validator.

Reset

A destructive action that deletes all existing metric data and incidents from a source or validator and places them into a pending backfill mode. Unlike a rerun, a reset erases history and cannot be undone. See Resetting a Validator and Resetting a Source.

Saved Search

A persisted combination of anchors and filters for the lineage page, shareable across the workspace.

Segmentation

Subset of a dataset, defined by a segmentation-field (dimension). For example, if "gender" is a segmentation-field, then "male" and "female" could be 2 segments.

Shallow Data Observability

Data observability that is not deep. That is, not comprehensive along the six dimensions of data sources, data formats, data granularity, validator configuration, cadence, and user focus.

Sources

Sources or Source connectors are used to integrate your data source with Validio, and read data into the platform. Data read by Validio can then be monitored and validated.

Validator

A component in Validio that validates data, that is, make sure the data behaves as expected. Validators can consist of, for example, filters, aggregations, and rules.

Validio / the platform

Validio or the platform refers to the Validio platform, unless otherwise specified.

Window

Subset of a dataset, defined by a rule to group records. The windows can be based on time, count, or file.