Data Warehouse

Validio supports many of the major data warehouses that modern data teams work with today.

General considerations

Validio reads and ingests data incrementally, which requires a cursor field (incremental field) and a lookback time. The lookback time indicates how far back in time Validio ingests data from your source.

A cursor field is a timestamp which specifies when data was updated or added:

  • The cursor field must be a timestamp.
  • The cursor field should not include NULL values. Records where the cursor field is NULL are ignored.

For Validio to effectively read data, we recommend that you apply optimization(s) such as, indexing, clustering, partitioning, or similar to your cursor field.

Depending on your data and the type of quality issues you want to identify, you can either provide an added-timestamp or an updated-timestamp. The added-timestamp indicates added data, while the updated-timestamp indicates updated data.

Note: These timestamps have different implications because of the nature of incremental reads. For example, updated-timestamp captures updates on existing records, such as counts count number of updates instead of number of records. Conversely, an added-timestamp captures record counts as the count of added records and ignores updates to existing rows.

We recommend that you to provide an updated-timestamp.


Flatten nested data

Nested data must be flattened before ingested and validated in Validio. This pertains only to the field to be validated, other fields don't need to be flattened.

We are currently working on adding support for semi-structured data.


Avoid full table scans

Efficient ingestion into the Validio platform requires that the cursor field has an index, or a similar mechanism that avoids full table scans.