For data from data warehouse or query engine sources, you can leverage SQL to create validators customized for your business needs. For example, you can use a custom SQL validator to count duplicates of values or records, compare fields or records between tables, apply custom filters to your data, and more.

📘
Custom SQL Validator Versions
This article discusses the updated Custom SQL validator configuration and syntax. For the previous version of the Custom SQL validator, refer to Custom SQL (deprecated). For migration guidelines between the syntax versions, see Migrating Custom SQL Syntax.

Configuring a Custom SQL Validator

To configure a Custom SQL Validator,

Navigate to the specific source where you want to add the validator.
Click + New Validator to start the configuration wizard.
Under Validator Type, select Custom SQL.
Under Metric,
1. Select a Window or create a new window to specify the amount of data o time over which the validator operates.
2. Select a Segmentation or create a new segmentation to break your data into separate groups for analysis.
3. Define the metric to calculate using a valid SQL expression. The SQL expression must return a field named validio_metric and columns that match the segmentation fields configured for the validator. If you are using a tumbling window, the SQL expression can use Template Functions and must also return a field named validio_window_id. For more information, see Mandatory Fields, Segmentation Fields, and Template Functions.
4. (Recommended) Click Test Query to verify that your SQL expression is valid. For more information, see Testing the SQL Query.
5. Check Initialize using historic data to backfill with historical data for validation.
Under Thresholds, select the type of threshold to use to detect incidents. You can configure a Fixed threshold, Dynamic threshold, or a Difference threshold. For more information, see About Thresholds.
Under Validator Details, enter a Name, Description, Tags, and Owner.
Click Continue to create the Custom SQL Validator.

For examples of Custom SQL validator configurations, refer to Custom SQL Validator Recipes.

Testing the SQL Query

You can test the SQL query for your custom validator to verify that your SQL expression is valid:

Test Query–(Recommended) Test that your SQL query is valid.
- If the query executes properly, the query returns a table with the first 20 records.
- If the query did not execute properly, the query returns an error message propagated from the data source.
Skip Test–Continue configuring the validator without testing.

👍
Recommendation
Because syntax errors are common in custom SQL, we recommend always testing the query so that you can discover and correct errors early, instead of during source polling.

Mandatory Fields

The SQL query must return the mandatory fields validio_metric and, if the validator uses a tumbling window, validio_window_id.

❗️
Important
Thevalidio_metric and validio_window_id fields should be written in lower case, which requires using quotation marks for some data sources, such as double quotation marks for Snowflake.

Validation Metric

The Custom SQL query must return a field named validio_metric to instruct Validio which field to use for validation.

Validation Window

You can configure a Custom SQL validator to use either Tumbling or Global windows.

With Tumbling windows, you can write the SQL query using Validio template functions The SQL query must return the validio_metric field as well as a field named validio_window_id. This ID is used to map the metrics for each tumbling window to their respective time ranges. For more information, see Template Functions.
With Global windows, you can use regular SQL syntax (which must return the validio_metric field).

Segmentation Fields

The Custom SQL query must return columns that match the Segmentation fields configured for the validator. For example, if segmentation on Country is used, then the SQL query should return a column called Country. All returned segmentation columns must be of type string, and if necessary you can use casting to ensure the type. For multi-field segmentations, each field is displayed as a separate column.

Aliases for the returned segmentation columns must follow the following naming conventions:

Naming Conventions	Example of Segmentation Field → Column Name
Match the name and case of the underlying column. For some data sources, you will need to use quotation marks around the column name.	country → country ZipCode → ZipCode
Non-alphanumeric characters should be replaced with an underscore.	foo:bar.baz → foo_bar_baz
For semi-structured segmentations using arrays, the array access should be replaced with underscore followed by the index.	foo:bar.baz[0].qux → foo_bar_baz_0_qux
If the segmentation is using the array length field, the alias should be suffixed with _length.	foo:bar.baz[0].qux.length() → foo_bar_baz_0_qux_length

Template Functions

You can write your custom SQL Query to use placeholders and template functions to reference Source, Segmentations, and Windows to help integrate your custom SQL logic with Validio tumbling windows.

❗️
Important
Template functions are only available if your Custom SQL Validator is configured to use tumbling windows.

Placeholder	Description
`{{ validio_window_id(<identifier>) }}`	(Required) Used to generate an ID for all data points that belong to the same window. Must be placed in the SELECT clause. For more information, see Window ID.
`{{ validio_window_filter(<identifier>) }}`	(Required) Used to process only the data that has not already been processed. Must be placed in the WHERE clause. For more information, see Window Filter .
`{{ validio_reference_window_filter(<identifier>, <lookback>) }}`	(Optional) Used when data in the current window needs to be compared to data in previous windows. Must be placed in the WHERE clause. For more information, see Reference Window Filter.
`{{ validio_segmentation_filter() }}`	(Optional) Used when a segmentation filter is specified on the segmentation attached to a validator. Must be placed in the WHERE clause. For more information, see Segmentation Filter.
`{{ validio_source_table }}`	(Required for SQL sources) Used as a convenience function to get the source table identifier. You must use this function if the source is a SQL source (meaning that it is configured using a custom SQL query). For more information, see Source Table.

Window ID

The required function validio_window_id(<identifier>) is used to generate an ID for all data points that belong to the same window. The function must be placed in the SELECT clause.

The function expands to a formula that calculates which window the datetime in <identifier> belongs to. Then it counts the number of windows the calculated window is from a reference timestamp to get a monotonically increasing numerical ID for each window in the query. (The reference timestamp is often the timestamp for the oldest data point for the current poll.)

This column must be included in the GROUP BY clause to aggregate over all data points in the same window. Because it is used to link metrics to the correct window time ranges, the column must be returned as the name validio_window_id (matching the case).

Window Filter

The required function validio_window_filter(<identifier>) expands to a filter condition that limits the query to the active time range. For example, during a backfill poll the function will expand to a time range that covers the lookback period. During a normal poll the function will span back to where the last poll completed, which will be the time range for the latest window (or windows). The <identifier> should be the timestamp column, which is usually the cursor column.

The function expands to the following filter condition:

<identifier> >= <first_window_start_timestamp> AND
<identifier < <last_window_end_timestamp>

This function must be used on the source table, but can also be used on other tables, such as when comparing data from two different tables. (You can see this demonstrated in the Tumbling Reference Count recipe in Custom SQL Validator Recipes.) The window filter function works as an optimization to process only the data that is relevant for the current poll.

Reference Window Filter

The optional function validio_reference_window_filter(<identifier>, <lookback>) is useful when doing reference validation, where data in the current window needs to be compared to data in previous windows. This function is similar to the Window Filter function, but takes a second argument that can be used to extend the time range with a <lookback> amount of windows.

The function expands to the following filter condition:

<identifier> >= <first_window_start_timestamp - lookback windows> AND
<identifier < <last_window_end_timestamp>

You can see an example of a reference window filter function in the Segmented Tumbling Mean Ratio recipe in Custom SQL Validator Recipes.

Segmentation Filter

The function validio_segmentation_filter() must be used in place of a segmentation filter expression specified on the segmentation attached to the SQL validator. The function automatically expands to the filter expression before polling for new data on the validator.

You can also use the segmentation filter placeholder in a query when a segmentation filter is not used.

❗️
Important
For the SQL validator to be valid, segmentation filtering for the validator must also be applied to its query. Otherwise polling will fail and the SQL validator will not be created if there is a segmentation filter present without a matching placeholder in the query.

Source Table

The function validio_source_table() is a convenience function that can be used instead of manually writing the source table identifier. This function is only required if you are using custom SQL to specify the source table when configuring your data source.

This function should be used in the FROM clause and it will expand to the quoted identifier for the table. For example, a BigQuery source expands into:

`project`.`dataset`.`table`

📘
SQL Validator and SQL Sources
If the SQL validator is used on a SQL source then the validio_source_table() function must be used as the table identifier since the SQL source query definition will be added as a CTE to the validator query during runtime. In this case the function expands to the name of the CTE.
For more information, see About Custom SQL Sources.

Custom SQL Query Example

This example demonstrates how to use template functions to build a custom SQL query. The validator is configured with a daily tumbling window and segmentation on country.

The following custom SQL query uses template functions:

SELECT
	`country`,
	COUNT(*) AS `validio_metric`,
	{{ validio_window_id(`created_at`) }} AS `validio_window_id`
FROM
	`project`.`dataset`.`table`
WHERE
	{{ validio_window_filter(`created_at`) }}
GROUP BY
	`country`,
	-- Note: not all warehouses support grouping by alias like this.
	-- In those cases the template function can be used here as well.
	`validio_window_id`

The following shows the full SQL query after the template functions have been expanded:

SELECT
	`country`,
	COUNT(*) AS `validio_metric`,
	-- Expanded {{ validio_window_id(`created_at`) }}
	FLOOR(
	    	CAST(
       	  	TIMESTAMP_DIFF(
            			`created_at`,
            			'2024-02-26T00:00:00Z', —- Reference timestamp
            			SECOND
        		) AS FLOAT64
    		) / 86400
	) AS `validio_window_id`,
FROM
	`project`.`dataset`.`table`
WHERE
	-- Expanded {{ validio_window_filter(`created_at`) }}
	-- In this example it is a backfill poll for a source with 7 days lookback 
	`created_at` >= '2024-02-26T00:00:00Z' AND
	`created_at` < '2024-03-04T00:00:00Z'
GROUP BY
	`country`,
	`validio_window_id`

The following table shows the output of the SQL query:

`validio_metric`	`validio_window_id`	`country`	`window_start` ¹	`window_end` ¹
10	0	SE	2024-02-26T00:00:00Z	2024-02-27T00:00:00Z
15	0	US	2024-02-26T00:00:00Z	2024-02-27T00:00:00Z
8	1	SE	2024-02-27T00:00:00Z	2024-02-28T00:00:00Z
13	1	US	2024-02-27T00:00:00Z	2024-02-28T00:00:00Z

¹ The window_start and window_end columns are shown for illustrative purposes and are not part of the SQL validator query.

For more examples of validator configurations, refer to Custom SQL Validator Recipes.