About Sources
Validio can validate structured and semi-structured data from many data sources.
Supported Data Sources
Configure a Source to authenticate Validio and define a data source to validate. Supported source types include the Data Warehouses, Object storage, and Data streams listed in this table:
Schema
Validio defines a schema for every source, either based on metadata or inference.
Schema from metadata:
Validio reads the schema from the metadata in the data source, for example, from INFORMATION_SCHEMA
in a Data Warehouse. This is true for most structured data types.
Schema from inference:
Validio infers the schema from the existing data when no pre-defined schema exists. This is true for most semi-structured data types, for example JSON
. Depending on your Source type, it might take a few seconds to infer the schema.
Validio can only infer schema when data exists in the source.
Detect schema changes
Validio automatically validates schema changes for structured data types in Data Warehouses and files in Object Storages. Schema checks are executed hourly, and any detected schema changes are reported as Incidents.
Semi-structured data
In addition to structured data, Validio supports semi-structured and other complex data types. You can select these fields or certain nested fields when you configure a Source.
JSONPath
Validio uses JSONPath to represent data structures.
For example, the JSONPath expression
some_array.length()
represents the size of an array.
Supported semi-structured and complex data types
Source system | Data type |
---|---|
BigQuery | JSON, ARRAY, STRUCT |
PostgreSQL | JSON, JSONB, array types |
Redshift | SUPER |
Snowflake | ARRAY, OBJECT, VARIANT |
GCS | Parquet |
S3 | Parquet |
Kafka | JSON, protobuf, Avro |
Kinesis | JSON, protobuf, Avro |
Pub/Sub | JSON, protobuf, Avro |
Pub/Sub Lite | JSON, protobuf, Avro |
Athena | ARRAY, MAP, STRUCT |
Array support
Currently, Validio does not support data validation within an array.
However, you can validate the size of an array. For each array, Validio adds a computed numeric field named
some_array.length()
.
Updated 9 months ago