Data Warehouse Datasets

Data Warehouse Datasets defines batches by specifying a timeout

Parameter name and description Mandatory Parameter value
1. Name Arbitrary String
2. Source Configured Source connector
3. Reference source

Second source to connect when reference monitors are used

Configured Source connector
4. Reference sliding window

Number of batches in reference source used when computing metric. E.g. input of 5 and applying a mean difference monitor will take the mean of the latest batch in the target source and subtract it with the mean of the 5 latest batches in the reference source

5. Notification rule

Note: Without a notification rule, alerts will be visible in the platform UI, but not sent as a notification to a notification channel, e.g. Slack

Configured Notification rules
6. Partitioned batching

Unchecked: batching logic (timeout) will be based on latest global records/datapoint, i.e. batching will be triggered at the same time across partitions

Checked: batching logic (timeout) will be based on latest partitioned record/datapoint, i.e. partitions will be batched independently from each other

7. Timeout (seconds)

If no new data has been ingested after the specified timeout, a new batch is created

8. Maximum session time (seconds)

If a timeout is never triggered due to new data being ingested within the specified timeout, maximum session time will force a new batch calculation

9. Data time feature

Empty: Order of the records/datapoints will be determined by the time of ingestion of the data into Validio

Filled in: Mechanism to batch records, and the time used to show metrics and alerts NOTE: To corretly batch and display historical data, i.e. backfilling, a data time feature is needed

Feature with a timestamp format in Source