HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

About Sources

Validio can validate structured and semi-structured data from many data sources.

Supported Data Sources

Configure a Source to authenticate Validio and define a data source to validate. Supported source types include the Data Warehouses, Object storage, and Data streams listed in this table:

Schema

Validio defines a schema for every source, either based on metadata or inference.

Schema from metadata:

Validio reads the schema from the metadata in the data source, for example, from INFORMATION_SCHEMA in a Data Warehouse. This is true for most structured data types.

Schema from inference:

Validio infers the schema from the existing data when no pre-defined schema exists. This is true for most semi-structured data types, for example JSON. Depending on your Source type, it might take a few seconds to infer the schema.

📘

Note

Validio can only infer schema when data exists in the source.

Detect schema changes

Validio automatically validates schema changes for structured data types in Data Warehouses and files in Object Storages. Schema checks are executed hourly, and any detected schema changes are reported as Incidents.


Semi-structured data

In addition to structured data, Validio supports semi-structured and other complex data types. You can select these fields or certain nested fields when you configure a Source.

📘

Note

Validio uses JSONPath to represent data structures.

For example, the JSONPath expression some_array.length() represents the size of an array.

Supported semi-structured and complex data types

Source system

Data type

Athena

ARRAY, MAP, STRUCT

ClickHouse

Tuple (and named Tuples), Nested, Array
For more information, see ClickHouse integration.

GCS

Parquet

Google BigQuery

JSON, ARRAY, STRUCT

Kafka

JSON, protobuf, Avro

Kinesis

JSON, protobuf, Avro

PostgreSQL

JSON, JSONB, array types

Pub/Sub

JSON, protobuf, Avro

Pub/Sub Lite

JSON, protobuf, Avro

Redshift

SUPER

S3

Parquet

Snowflake

ARRAY, OBJECT, VARIANT

📘

Note

Currently, Validio does not support data validation within an array.

However, you can validate the size of an array. For each array, Validio adds a computed numeric field named some_array.length().