Object Store Datasets

Object store Datasets defines batches by file/BLOBs, e.g. one CSV file would be a batch

Parameter name and description Mandatory Parameter value
1. Name Arbitrary String
2. Source Configured Source connector
3. Reference source

Second source to connect when reference monitors are used

Configured Source connector
4. Reference sliding window

Number of batches in reference source used when computing metric. E.g. input of 5 and applying a mean difference monitor will take the mean of the latest batch in the target source and subtract it with the mean of the 5 latest batches in the reference source

Integer
5. Notification rule

Note: Without a notification rule, alerts will be visible in the platform UI, but not sent as a notification to a notification channel, e.g. Slack

Configured Notification rules
6. Data time feature

Empty: Order of the records/datapoints will be determined by the time of ingestion of the data into Validio

Filled in: Mechanism to batch records, and the time used to show metrics and alerts NOTE: To corretly batch and display historical data, i.e. backfilling, a data time feature is needed

Feature with a timestamp format in Source

📘

Object Store Datasets in distribution shift validation

Object stores Datasets are often used in ML use cases to monitor data drifts. Using a production training dataset as reference dataset, as new data is collected, distribution shift metrics can be monitored between the reference dataset and the newly collected dataset. Learn more about relative entropy and other numeric reference metrics here