Schema Detection

Source Schema list

Validio derives a schema for every source using metadata or schema inference. Validio reads the schema for the data source automatically if data exists in the source.

  • Schema from metadata: For most structured data types, Validio reads the schema from the metadata in the data source. For example, Validio can read the metadata from INFORMATION_SCHEMA in a data warehouse source.
  • Schema from inference: When pre-defined schema does not exist, such as for unstructured and semi-structured data types (like JSON), Validio infers the schema from the existing data.

Depending on the data source and data types, you can manually configure the schema by selecting fields (including nested fields) to validate.

Nullable Fields

If the automatically inferred schema does not match what you expect from incoming data, you can change the nullability and data types.

  • Nullable fields and metric validation: Check the Nullable option to include datapoints with NULL values in validation. If NULL exists in a field where the option is not selected, this particular datapoint will not be included in the validator metrics.

Detecting Schema from Sample Data

Schema detection error with option to Upload sample data file.

When configuring a source, schema inference can time out if the source table it reads from is too large, if it cannot determine the data type of schema fields, and if it detects semi-structured rows with different data types. In this case, you will need to upload a sample of the data, and Validio will try to infer the schema based on the uploaded file.

This sample data file should be a JSON file with properties for each column or field in the schema and a value representing that field. The following is a basic example of the data file:

{
  "date": "2025-01-01",
  "user_id: 123,
  "user_profile": {
    "age": 20,
    "name: "Bob"
  }
}
❗️

The properties (fields and values) used in the uploaded sample data file must match the appropriate case for the data source. For example, if the Snowflake source defaults to using uppercase, then schema detection will expect the properties in the data file to use uppercase.

Schema Change Validation

Validio automatically validates schema changes for structured data types in data warehouses and files in object storage. Schema checks run hourly, and any detected schema changes are reported as incidents. For more information, see About Validator Incidents.