Data Observability

A state where an organization has full visibility into its data pipelines. This state enables data teams to improve data quality over time. Data Observability realized overnight, but rather a vision data teams should always strive towards and continuously come closer to.

Data quality

The extent to which an organization’s data can be considered fit for its intended purpose. Data quality can be considered along five dimensions: freshness, volume, schema, (lack of) anomalous values, and distribution. It is always relative to the data’s specific business context.

Data quality incidents

Data which does not fulfill the requirements of Data quality, and is not considered fit for its intended purpose. In the Validio platform, an incident is triggered when a metric value breaches a set threshold.

Datapoint

The actual value of a piece of data. For example, "FGK3". A datapoint can be both scalar ("5") and non-scalar values ("JSON blob").

Dataset

A defined container of records that have the same schema.

Deep Data Observability

Data observability is considered deep if it is comprehensive in terms of six dimensions: Data sources, data formats, data granularity, validator configuration, cadence, and user focus.

Deep Data Observability Platform

A system designed specifically to help an organization come closer to the state of deep data observability.

Field

A container for datapoints of the same kind. For example, a column in a database or a key in a JSON schema.

File-based

Each file, such as CSV, is its own window. This is mainly useful for data in object storages.

Filter

A rule that determines which raw data should be validated.

Fixed-batch

Group a number of (N) succeeding events into each window. For example, "100 events per window".

Record

A collection of related fields treated as a unit, such as a row in a database table, or events in a stream.

Segmentation

Subset of a dataset, defined by a segmentation-field (dimension). For example, if "gender" is a segmentation-field, then "male" and "female" could be 2 segments.

Shallow Data Observability

Data observability that is not deep. That is, not comprehensive along the six dimensions of data sources, data formats, data granularity, validator configuration, cadence, and user focus.

Sources

Sources or Source connectors are used to integrate your data source with Validio, and read data into the platform. Data read by Validio can then be monitored and validated.

Validator

A component in Validio that validates data, that is, make sure the data behaves as expected. Validators can consist of, for example, filters, aggregations, and rules.

Validio / the platform

Validio or the platform refers to the Validio platform, unless otherwise specified.

Window

Subset of a dataset, defined by a rule to group records. The windows can be based on time, count, or file.