Glossary
Definitions of terms and concepts used in Validio
Data Observability
A state where an organization has full visibility into its data pipelines. This state enables data teams to improve data quality over time. Data Observability realized overnight, but rather a vision data teams should always strive towards and continuously come closer to.
Data quality
The extent to which an organization’s data can be considered fit for its intended purpose. Data quality can be considered along five dimensions: freshness, volume, schema, (lack of) anomalous values, and distribution. It is always relative to the data’s specific business context.
Data quality incidents
Data which does not fulfill the requirements of Data quality, and is not considered fit for its intended purpose. In the Validio platform, an incident is triggered when a metric value breaches a set threshold.
Datapoint
The actual value of a piece of data. For example, "FGK3". A datapoint can be both scalar ("5") and non-scalar values ("JSON blob").
Dataset
A defined container of records that have the same schema.
Deep Data Observability
Data observability is considered deep if it is comprehensive in terms of six dimensions: Data sources, data formats, data granularity, validator configuration, cadence, and user focus.
Deep Data Observability Platform
A system designed specifically to help an organization come closer to the state of deep data observability.
Field
A container for datapoints of the same kind. For example, a column in a database or a key in a JSON schema.
File-based
Each file, such as CSV, is its own window. This is mainly useful for data in object storages.
Filter
A rule that determines which raw data should be validated.
Fixed-batch
Group a number of (N) succeeding events into each window. For example, "100 events per window".
Record
A collection of related fields treated as a unit, such as a row in a database table, or events in a stream.
Segmentation
Subset of a dataset, defined by a segmentation-field (dimension). For example, if "gender" is a segmentation-field, then "male" and "female" could be 2 segments.
Shallow Data Observability
Data observability that is not deep. That is, not comprehensive along the six dimensions of data sources, data formats, data granularity, validator configuration, cadence, and user focus.
Sources
Sources or Source connectors are used to integrate your data source with Validio, and read data into the platform. Data read by Validio can then be monitored and validated.
Validator
A component in Validio that validates data, that is, make sure the data behaves as expected. Validators can consist of, for example, filters, aggregations, and rules.
Validio / the platform
Validio or the platform refers to the Validio platform, unless otherwise specified.
Window
Subset of a dataset, defined by a rule to group records. The windows can be based on time, count, or file.
Updated about 1 year ago