Validio supports many of the major data warehouses that modern data teams work with today.
Validio reads and ingests data incrementally, which requires a cursor field (incremental field) and a lookback time. The lookback time indicates how far back in time Validio ingests data from your source.
A cursor field is a timestamp which specifies when data was updated or added:
- The cursor field must be a timestamp.
- The cursor field should not include NULL values. Records where the cursor field is NULL are ignored.
For Validio to effectively read data, we recommend that you apply optimization(s) such as, indexing, clustering, partitioning, or similar to your cursor field.
Depending on your data and the type of quality issues you want to identify, you can either provide an
added-timestamp or an
added-timestamp indicates added data, while the
updated-timestamp indicates updated data.
Note: These timestamps have different implications because of the nature of incremental reads. For example,
updated-timestamp captures updates on existing records, such as
counts count number of updates instead of number of records. Conversely, an
record counts as the count of added records and ignores updates to existing rows.
We recommend that you to provide an
Flatten nested data
Nested data must be flattened before ingested and validated in Validio. This pertains only to the field to be validated, other fields don't need to be flattened.
We are currently working on adding support for semi-structured data.
Avoid full table scans
Efficient ingestion into the Validio platform requires that the cursor field has an index, or a similar mechanism that avoids full table scans.
Updated 27 days ago