Streaming Datasets

Streaming Datasets defines a batch by specifying the number of records/datapoints in each batch

Parameter name and description Mandatory Parameter value
1. Name Arbitrary String
2. Source Configured Source connector
3. Reference source

Second source to connect when reference monitors are used

Configured Source connector
4. Reference sliding window

Number of batches in reference source used when computing metric. E.g. input of 5 and applying a mean difference monitor will take the mean of the latest batch in the target source and subtract it with the mean of the 5 latest batches in the reference source

5. Notification rule

Note: Without a notification rule, alerts will be visible in the platform UI, but not sent as a notification to a notification channel, e.g. Slack

Configured Notification rules
6. Batch size

How many records/datapoints should be included in each batch

7. Batch timeout (sec)

If time since last record exceed batch timeout, a batch calculation will be triggered even if batch size count is not reached

8. Partitioned batching

Unchecked: batching logic (record count) will be based on global records/datapoint count, i.e. batching will be triggered at the same time across partitions

Checked: batching logic (record count) will be based on partitioned record/datapoint count, i.e. partitions will be batched independently from each other

9. Data time feature

Empty: Order of the records/datapoints will be determined by the time of ingestion of the data into Validio

Filled in: Mechanism to batch records, and the time used to show metrics and alerts NOTE: To corretly batch and display historical data, i.e. backfilling, a data time feature is needed

Feature with a timestamp format in Source