Object storage

Validio supports major object storage that modern data teams work with today.

General considerations

Consider the following when you ingest and validate fields identified in CSV files from an object storage:

  • You must specify a folder when configuring an object storage source connector.
    Validio ingests files from the folder recursively, which includes Nested folders.
  • Files are read in order from earliest to most recent time when they were modified.
  • You can use regex expressions to filter what files to ingest.
  • The CSV schema must be consistent across all files within the specified top level folder:
    1. For each of the files, the connector expects the same fields and field names in the same order.
    2. Missing fields are interpreted as empty fields, which have consequences on the analytics involving those fields.
  • New files are detected based on the file timestamp attribute that suggests the last updated time. This attribute is specific to each object storage provider:
    1. Last-modified attribute on S3
    2. timeCreated attribute on GCS