About Sources

Validio can validate structured and semi-structured data from many data sources.

Supported Data Sources

Configure a Source to authenticate Validio and define a data source to validate. Supported source types include the Data Warehouses, Object storage, and Data streams listed in this table:

Schema

Validio defines a schema for every source, either based on metadata or inference.

Schema from metadata:

Validio reads the schema from the metadata in the data source, for example, from INFORMATION_SCHEMA in a Data Warehouse. This is true for most structured data types.

Schema from inference:

Validio infers the schema from the existing data when no pre-defined schema exists. This is true for most semi-structured data types, for example JSON. Depending on your Source type, it might take a few seconds to infer the schema.

🚧

Validio can only infer schema when data exists in the source.

Detect schema changes

Validio automatically validates schema changes for structured data types in Data Warehouses and files in Object Storages. Schema checks are executed hourly, and any detected schema changes are reported as Incidents.


Semi-structured data

In addition to structured data, Validio supports semi-structured and other complex data types. You can select these fields or certain nested fields when you configure a Source.

📘

JSONPath

Validio uses JSONPath to represent data structures.

For example, the JSONPath expression some_array.length() represents the size of an array.

Supported semi-structured and complex data types

Source systemData type
BigQueryJSON, ARRAY, STRUCT
PostgreSQLJSON, JSONB, array types
RedshiftSUPER
SnowflakeARRAY, OBJECT, VARIANT
GCSParquet
S3Parquet
KafkaJSON, protobuf, Avro
KinesisJSON, protobuf, Avro
Pub/SubJSON, protobuf, Avro
Pub/Sub LiteJSON, protobuf, Avro
AthenaARRAY, MAP, STRUCT

📘

Array support

Currently, Validio does not support data validation within an array.

However, you can validate the size of an array. For each array, Validio adds a computed numeric field named some_array.length().