Configuring Windows
Data is validated in batches, called windows, which you can configure on your source. You can then select the configured window when you create new validators or segmentations on the source.
To a window to a source,
- Navigate to the source's details page and click the Windows tab.
- Click + New window.
- Under Select Type, choose the Window type that you want to create: Fixed Batch, Global, Tumbling, or File. You will only be able to to select the window type that is valid for this source. For example, file windows are only supported for object storage sources.
- Under Configuration, specify the required config options for your Window type. For more information refer to the configuration parameters for each Window type.
Note
When configuring windows, a time field is used. The format of the time field is converted to UTC in Validio, but the graphs will always display times in your system's timezone.
- (Optional) Under Segment retention period (days), enter the maximum number of days to keep segments when new data has not been seen.
- Enter a Name for the window, or click Generate name to automatically create one based on your configuration.
- Click Create Window.
After creating a window, you can add validators to use the new window. For more information, see Configuring a Validator.
Configuration Parameters
Fixed Batch Window
Field | Value | Description |
---|---|---|
Data-time field | Field name | Identifier for the index field used to configure the Window. |
Batch size | Numeric | Number of datapoints (rows) of the Window. For example, |
Segmented batching | True | If |
Global Window
Global window requires no configuration.
Tumbling Window
Field | Values | Description |
---|---|---|
Data-time field | Field name | The name of the field that references the timestamp associated with each record or row in the data source. |
Window size | Numeric | Length of the Window in the selected time unit. For example, |
Unit | Minute | Unit of time to define Window size. |
Disable window timeout | True | Set to true if the window should be automatically closed without considering the most recent data-time. |
File Window
Field | Value | Description |
---|---|---|
Data-time field | Field name | Identifier for the field used to configure the Window. |
Note
File window datasets are often used for distribution shift validation, such as in ML use cases to monitor data drifts.
If you use a production training dataset as reference dataset, as new data is collected, you can monitor distribution shift metrics between the reference dataset and the newly collected dataset.
For information on numeric reference metrics, such as relative entropy, refer to the Numeric distribution or Categorical distribution Validator types.
Segment Retention Policy
Segment retention period (days) is an optional setting on validator windows that sets a threshold to remove segments that may have become stale. The segment is considered stale when the last time data was processed on the segment exceeds the retention period.
Note
The threshold is relative to the most recent segment that was processed. When left unset, Validio does not clean or remove stale segments.
Updated 12 days ago