Freshness
Freshness validators evaluate the time elapsed since the data was last updated on the source. Freshness works on "data time"--which means that it validates the actual date-time or timestamp column in the table (versus the window end time).
You can backfill data and use segmentation on Freshness validators.
Recommendation
The following setup applies to the majority of use cases for monitoring Freshness:
- Use a daily tumbling window with "Disable Window timeout" checked.
- Set the polling schedule to daily, 1-2 hours after the expected pipeline job completion.
- (Optional) Configure a segmentation field.
Note
You can only configure the Freshness validator to use a Global window or Tumbling window. For more information, see Choosing the Window Type.
Add a Freshness Validator
To add a Freshness Validator,
- Navigate to the Source where you will add the validator, and click + New Validator.
- Under Validator Type, select Freshness.
- Under Segmentation, select an existing segmentation or create one. Default is Unsegmented.
- Under Window, select an existing Tumbling or Global window or create one.
- Under Source Config, select a Filter Type.
You can choose from: No Filter (Default), Boolean, Enum, Null, String, and Threshold. For more information, see Filters. - Under Config, the Initialize with backfill option is checked.
- Under Threshold, select the Threshold Type and fill in the settings for Preset, Sensitivity, and Decision Bounds. For more information, see About Thresholds.
- Click Continue to create the Freshness validator.
- After the validator is created, click View validator to open its details page.
It will take a few minutes for data to start populating the graph and table on this page--depending on the polling schedule and if you manually backfill the validator.
Choosing the Window Type
You can only configure Freshness validators to use tumbling or global windows. For more information, see About Windows.
In general, tumbling windows are recommended whenever possible to enable instant dynamic threshold training and backfill of historic data.
Global windows can be used when you:
- Want to set up freshness validation on a large number of tables and don't need backfill or segmentation on these tables.
- Prefer having the X-axis display the difference for polling time instead of data-time (window end time).
Interpreting the Freshness Graph
The time represented in the Freshness graph depends on the window type: data-time for a Tumbling window, and polling time for a Global window.

Freshness with a Tumbling Window
Tumbling windows are configured with a date-time field to represent the data time.
- X-axis--The timestamps on the X-axis represent the end time for each window, using the date-time field. The window end time is based on UTC, but the graph displays the local system time. For example, daily window end times are always 00:00 UTC to 00:00 UTC, and hourly window end times are 01:00 UTC, 02:00 UTC, 03:00 UTC. The window end time is converted to the local system time before it is displayed on the graph.
- Y-axis--The Y-axis displays the difference between window end time and the latest timestamp within the window. Any data containing dates later than the window end time, such as timestamps in the future, will be used in the freshness calculation for the next window.
Example
Tumbling window with "print screen":
- Daily window closed on Oct 16th 00:00 UTC. Oct 16th 02:00 is displayed on the X-axis because the local system time is UTC+2 on the machine where "print screen" was captured.
- For the window that closed on Oct 16th 02:00 UTC+2, the Y-axis value is 2 days 1 hour, meaning that the latest timestamp that existed in the data at the time of the window closing was Oct 14th 01:00 UTC+2 (Oct 13th 23:00 UTC).
Freshness with a Global Window
Global windows are configured based on the polling time, when the validator checks for new data.
- X-axis--The timestamps on the X-axis represent the local clock-time corresponding to when source polls occur.
- Y-axis--The Y-axis displays the difference between polling time and the latest timestamp in the chosen datetime field.
Polling and Windows
There is a slight nuance to how window end times and polling (how often Validio queries for data) relate to each other, depending on whether you use a tumbling or global window.
- Tumbling window--The window size is set separately from the polling schedule and can be different from each other. The polling schedule can be more frequent, for example every six hours while the window size is daily.
- Global window--The window end time will follow the polling time. For example, if you set the polling schedule to poll every six hours, the window size (x-axis) and window closing time will also be six hours.

Recommendation
If you have a daily batch job that you expect to complete on a specific time each day, set the polling schedule for the source to 1-2 hours after the expected job completion to get the speediest updates.
Example
In the case of infrequent polling with a small window size, for example if you want to receive incidents on a daily basis but have a 1-hour granularity in your freshness graph: You can set a tumbling window size of 1-hour, with a polling schedule set to daily to save on query costs. Every day at polling time, Validio will backfill 24 1-hour windows from the latest poll.
Setting Window Timeout
For Tumbling windows, you have a "Disable window timeout" option, that means:
- Timeout enabled--(Option is unchecked) Allows a grace period of up to 1 window length before closing the window, depending on if new data has arrived or not.
- Timeout disabled--(Option is checked) Recommended for most Freshness cases. Closes the current window at the first poll, regardless if data has arrived or not.

Tumbling window and timeout enabled is primarily used when you expect irregular batch updates, such as when you expect data to sometimes arrive late and you don't want to be alerted each time. In this case, you can poll more frequently than the window size. Another example is when you don't load data in window-sized batches, such as when you have a job that continuously adds new data. In this case, you don't want to disable timeout because then you could close the window prematurely.
Updated 29 days ago