Freshness validators evaluate the time elapsed since the data was last updated on the source. Freshness works on "data time"--which means that it validates the actual date-time or timestamp column in the table (versus the window end time).

You can backfill data and use segmentation on Freshness validators.

Recommended Setup for Monitoring Freshness

The following setup applies to the majority of use cases for monitoring Freshness:

Use a daily tumbling window with "Disable Window timeout" checked.
Set the polling schedule to daily, 1-2 hours after the expected pipeline job completion.
(Optional) Configure a segmentation field.

Add a Freshness Validator

To add a Freshness Validator,

Navigate to the Source where you will add the validator, and click + New Validator.
Under Validator Type, select Freshness.
Under Segmentation, select an existing segmentation or create one. Default is Unsegmented.
Under Window, select an existing Tumbling or Global window or create one.
Under Source Config, select a Filter Type.
You can choose from: No Filter (Default), Boolean, Enum, Null, String, and Threshold. For more information, see Filters.
Under Config, the Initialize with backfill option is checked.
Under Threshold, select the Threshold Type and fill in the settings for Preset, Sensitivity, and Decision Bounds. For more information, see Dynamic Threshold for Freshness and About Thresholds.
Click Continue to create the Freshness validator.
After the validator is created, click View validator to open its details page.
It will take a few minutes for data to start populating the graph and table on this page--depending on the polling schedule and if you manually backfill the validator.

Choosing the Window Type

You can only configure Freshness validators to use tumbling or global windows. In general, tumbling windows are recommended whenever possible to enable instant dynamic threshold training and backfill of historic data.

Global windows can be used when you:

Want to set up freshness validation on a large number of tables and don't need backfill or segmentation on these tables.
Prefer having the X-axis display the difference for polling time instead of data-time (window end time).

For more information, see About Windows.

Interpreting the Freshness Graph

The time represented in the Freshness graph depends on the window type: data-time for a Tumbling window, and polling time for a Global window.

Freshness with a Tumbling Window

Tumbling windows are configured with a date-time field to represent the data time.

X-axis--The timestamps on the X-axis represent the end time for each window, using the date-time field. The window end time is based on UTC, but the graph displays the local system time. For example, daily window end times are always 00:00 UTC to 00:00 UTC, and hourly window end times are 01:00 UTC, 02:00 UTC, 03:00 UTC. The window end time is converted to the local system time before it is displayed on the graph.
Y-axis--The Y-axis displays the difference between window end time and the latest timestamp within the window. Any data containing dates later than the window end time, such as timestamps in the future, will be used in the freshness calculation for the next window.

Example

Tumbling window with "print screen":

Daily window closed on Oct 16th 00:00 UTC. Oct 16th 02:00 is displayed on the X-axis because the local system time is UTC+2 on the machine where "print screen" was captured.

For the window that closed on Oct 16th 02:00 UTC+2, the Y-axis value is 2 days 1 hour, meaning that the latest timestamp that existed in the data at the time of the window closing was Oct 14th 01:00 UTC+2 (Oct 13th 23:00 UTC).

Freshness with a Global Window

Global windows are configured based on the polling time, when the validator checks for new data.

X-axis--The timestamps on the X-axis represent the local clock-time corresponding to when source polls occur.
Y-axis--The Y-axis displays the difference between polling time and the latest timestamp in the chosen datetime field.

Polling and Windows

There is a slight nuance to how window end times and polling (how often Validio queries for data) relate to each other, depending on whether you use a tumbling or global window.

Tumbling window--The window size is set separately from the polling schedule and can be different from each other. The polling schedule can be more frequent, for example every six hours while the window size is daily.
Global window--The window end time will follow the polling time. For example, if you set the polling schedule to poll every six hours, the window size (x-axis) and window closing time will also be six hours.

👍
Recommended
If you have a daily batch job that you expect to complete on a specific time each day, set the polling schedule for the source to 1-2 hours after the expected job completion to get the speediest updates.

Example

In the case of infrequent polling with a small window size, for example if you want to receive incidents on a daily basis but have a 1-hour granularity in your freshness graph: You can set a tumbling window size of 1-hour, with a polling schedule set to daily to save on query costs. Every day at polling time, Validio will backfill 24 1-hour windows from the latest poll.

Setting Window Timeout

For Tumbling windows, you have a "Disable window timeout" option, that means:

Timeout enabled--(Option is unchecked) Allows a grace period of up to 1 window length before closing the window, depending on if new data has arrived or not.
Timeout disabled--(Option is checked) Recommended for most Freshness cases. Closes the current window at the first poll, regardless if data has arrived or not.

Tumbling window and timeout enabled is primarily used when you expect irregular batch updates, such as when you expect data to sometimes arrive late and you don't want to be alerted each time. In this case, you can poll more frequently than the window size. Another example is when you don't load data in window-sized batches, such as when you have a job that continuously adds new data. In this case, you don't want to disable timeout because then you could close the window prematurely.

Dynamic Threshold for Freshness

When using Dynamic Threshold for freshness validation, the model monitors the timeliness of your data by learning historical arrival patterns to establish dynamic and acceptable levels of "staleness" (how outdated data can be). This allows the model to accurately detect when data becomes significantly more delayed than usual and helps you to identify potential pipeline issues.

At its core, the model quantifies staleness by comparing actual data arrival times against expected schedules. It learns patterns in these delays--including their frequency and any recurring cycles (daily, weekly, monthly)--and dynamically adjust staleness thresholds accordingly. The model accounts for seasonality (such as typical weekend delays) and prioritizes common delay patterns over rare ones.

With Dynamic Threshold for Freshness, you can:

Establishing normal staleness tolerances for daily, weekly, or monthly data updates.
Be alerted when data suddenly becomes significantly much later than its historical norm.
Adjustive how strictly it alerts via sensitivity and adaptation rate. Lowering the sensitivity will increase the accepted tolerance for late data.
Identify regular update patterns in both metadata Freshness and data Freshness Validators.

📘
Sensitivity Settings and Freshness
Using the Default sensitivity setting allows data to be late for one extra window to prevent alert fatigue. The Wide sensitivity preset allows data to be two extra windows late before alerting. This offset is useful when it is acceptable for data to be slightly late on occasion. The Narrow sensitivity preset is recommended for use cases where you want to be alerted as soon as data is late more than expected.