A state where an organization has full visibility into its data pipelines. This state enables data teams to improve data quality over time. Data Observability realized overnight, but rather a vision data teams should always strive towards and continuously come closer to.
The extent to which an organization’s data can be considered fit for its intended purpose. Data quality can be considered along five dimensions: freshness, volume, schema, (lack of) anomalous values, and distribution. It is always relative to the data’s specific business context.
Data which does not fulfill the requirements of Data quality, and is not considered fit for its intended purpose. In the Validio platform, an incident is triggered when a metric value breaches a set threshold.
The actual value of a piece of data. For example, "FGK3". A datapoint can be both scalar ("5") and non-scalar values ("JSON blob").
A defined container of records that have the same schema.
Data observability is considered deep if it is comprehensive in terms of six dimensions: Data sources, data formats, data granularity, validator configuration, cadence, and user focus.
A system designed specifically to help an organization come closer to the state of deep data observability.
Destinations are used to write data in real-time to a specified destination. Identified data anomalies can be stored in a separate table or bucket used for further investigation.
A container for datapoints of the same kind. For example, a column in a database or a key in a JSON schema.
Each file, such as CSV, is its own window. This is mainly useful for data in object storages.
A rule that determines which raw data should be validated.
Group a number of (N) succeeding events into each window. For example, "100 events per window".
A collection of related fields treated as a unit, such as a row in a database table, or events in a stream.
Subset of a dataset, defined by a segmentation-field (dimension). For example, if "gender" is a segmentation-field, then "male" and "female" could be 2 segments.
Data observability that is not deep. That is, not comprehensive along the six dimensions of data sources, data formats, data granularity, validator configuration, cadence, and user focus.
Sources or Source connectors are used to integrate your data source with Validio, and read data into the platform. Data read by Validio can then be monitored and validated.
A component in Validio that validates data, that is, make sure the data behaves as expected. Validators can consist of, for example, filters, aggregations, and rules.
Validio or the platform refers to the Validio platform, unless otherwise specified.
Subset of a dataset, defined by a rule to group records. The windows can be based on time, count, or file.
Updated 12 days ago