When setting up an Object storage Source, you need to specify what data to include. This is defined by:
- File location - Point Validio to a primary folder for data retrieval. Data from subfolders will also be included.
- File pattern - Optionally, you can utilize regex expressions to filter files based on their filenames.
- Note that all files included in the Source need to share the same schema. You will get notified if Validio discovers any schema inconsistencies.
- Missing fields are considered empty.
Validio detects new files based on their filename. A file will only be read if its filenames is lexicographically greater than the last file in the previous poll.
Backfilling is limited to 250 files
When backfilling data from an Object storage, Validio will only read data from the lexicographically greatest 250 files. Subsequent polls will then read all later files, regardless of their quantity.
For Object storage sources, Validio adds an additional field, called
validio_file_created_at, to the schema. This field contains the timestamp for when the file was created, or updated, in the Object storage. The timestamp is in RFC3339 UTC "Zulu" format.
Costs associated with reading data from Object storage:
- If the traffic crosses cloud regions, there are potential network costs between Validio and the Object storage.
- The costs for listing objects are negligible.
Updated 3 months ago