Documentation
HomeRequest DemoContact

Custom SQL Validator Recipes

The following Custom SQL validator examples are based on BigQuery SQL. You can use these recipes as a starting point when writing Custom SQL validators for other data sources, such as Redshift or Athena.

For information about configuring Custom SQL validators, see Custom SQL.

Duplicate Count

This example validator counts the number of occurrences for a specific column with a window and sums the total count. The resulting value represents the number of duplicates for that particular value.

SELECT
    `validio_window_id`,
    SUM(duplicates) as `validio_metric`
FROM (
    SELECT
        {{ validio_window_id(`time`) }} AS `validio_window_id`,
        COUNT(*) - 1 AS duplicates
    FROM 
        `project`.`dataset`.`table`
    WHERE
        {{ validio_window_filter(`time`) }}
    GROUP BY
        `validio_window_id`, `column_to_check_for_duplicates`
)
GROUP BY
    `validio_window_id`

For the following example data, the result would be 3, where the duplicate values are b, b, and NULL.

column_to_check_for_duplicates
a
b
b
b
NULL
NULL

Referential Integrity

This example validator computes how many non-NULL values exist in a foreign key table which is not present in the primary key table.

SELECT
    {{ validio_window_id(`foreign_key_table`.`time`) }} AS `validio_window_id`,
    COUNTIF(`primary_key_table`.`foreign_key` IS NULL) AS `validio_metric`
FROM
    `project`.`dataset`.`foreign_key_table`
LEFT JOIN
    `project`.`dataset`.`primary_key_table`
ON
    `foreign_key_table`.`foreign_key` = `primary_key_table`.`foreign_key`
WHERE
    {{ validio_window_filter(`foreign_key_table`.`time`) }} AND
    `foreign_key_table`.`foreign_key` IS NOT NULL
GROUP BY
    `validio_window_id`

For the following example tables, the result would be 2, where the missing keys are 1003 and 1004.

foreign_key_table.foreign_key
1001
1002
1003
1004
NULL
primary_key_table.primary_key
1001
1002

Global Row Count

This example validator uses a global window to count the total number of records in the table. No template functions are needed since the full table will be scanned during each poll. The only requirement is that the query must return the validio_metric column.

SELECT
	COUNT(*) AS `validio_metric`
FROM
	`project`.`dataset`.`table`

Segmented Tumbling Row Count

This example validator is configured with a tumbling window on the field created_at and a segmentation on the field country. The SQL query uses the window template functions to calculate the number of records per country and window. The query must return the validio_metric, validio_window, and country columns.

SELECT
    {{ validio_window_id(`created_at`) }} AS `validio_window_id`,
    `country`
    COUNT(*) AS `validio_metric`
FROM
    `project`.`dataset`.`table`
WHERE
    {{ validio_window_filter(`created_at`) }}
GROUP BY
    `validio_window_id`,
    `country`

Segmented Tumbling Mean Ratio

This example validator is configured with a tumbling window on the field created_at and a segmentation on the field city. The SQL query uses the window template functions to calculate the mean ratio by comparing the mean of amount in the current window against the two previous windows. The query must return the validio_metric, validio_window, and city columns.

SELECT
    	`target`.`validio_window_id`,
    	`target`.`city`,
    	`target`.`target_mean` / NULLIF(
        	(
            	-- Calculate the mean for the previous two windows 
            	SUM(`reference`.`reference_amount`) /
            	NULLIF(COUNT(CAST(`reference`.`reference_amount` AS FLOAT64)), 0)
        	),
        	0
    	) AS `validio_metric`
FROM (
    	-- Calculate the mean for each window + segment
    	SELECT
            	{{ validio_window_id(`created_at`)}} AS `validio_window_id`,
            	`city`,
            	AVG(amount) AS `target_mean`
    	FROM
            	`project`.`dataset`.`table`
    	WHERE
            	{{ validio_window_filter(`created_at`) }}
    	GROUP BY
            	`validio_window_id`,
            	`city`
) `target`
JOIN (
    	SELECT
            	{{ validio_window_id(`created_at`)}} AS `validio_window_id`,
            	`city`,
            	`amount` AS `reference_amount`
    	FROM
            	`project`.`dataset`.`table`
    	WHERE
            	-- Use the reference version of the filter function
            	-- to include records from the two previous windows.
            	{{ validio_reference_window_filter(`created_at`, 2) }}
) `reference` ON (
    	-- Join the current window + segment with records from the two previous windows
    	`reference`.`validio_window_id` >= `target`.`validio_window_id` - 2 AND
    	`reference`.`validio_window_id` <= `target`.`validio_window_id` - 1 AND
    	`reference`.`city` = `target`.`city`
)
GROUP BY
    	`target`.`validio_window_id`,
    	`target`.`city`,
    	`target`.`target_mean`

Tumbling Reference Count

This example validator compares the number of records in two tables using hourly tumbling windows.

WITH ref AS (
	-- count web records in aggregation table per window
	SELECT
    		count(*) as `count`,
    		{{ validio_window_id(`created_at`) }} as `validio_window_id`
	FROM
    		`project`.`dataset`.`traffic_agg`
	WHERE
    		{{ validio_window_filter(`created_at`) }} AND
    		`platform` = 'web'
	GROUP BY
    		`validio_window_id`
), target AS (
	-- count records in web table per window
	SELECT
    		count(*) as `count`,
    		{{ validio_window_id(`created_at`) }} as `validio_window_id`
	FROM
    		`project`.`dataset`.`traffic_web`
	WHERE
    		{{ validio_window_filter(`created_at`) }}
	GROUP BY
    		`validio_window_id`
)
SELECT
	`target`.`validio_window_id`,
	(COALESCE(`ref`.`count`, 0) / `target`.`count`) * 100 AS `validio_metric`
FROM
	`target`
LEFT JOIN
	`ref` ON (
    		`ref`.`validio_window_id` = `target`.`validio_window_id`
	)

For the following example tables, the result is: window 10:00 = 100, window 11:00 = 50.

traffic_web.created_at
10:00
10:01
11:00
11:01
traffic_agg.created_attraffic_agg.platform
10:00web
10:01web
11:01web

Duration Between Records

This example validator is configured with an hourly tumbling window on created_at and calculates the maximum duration in seconds between the records timestamp in each window.

SELECT
	`validio_window_id`,
	MAX(`duration`) AS `validio_metric`,
FROM (
	SELECT
    		{{ validio_window_id(`created_at`) }} AS `validio_window_id`,
    		COALESCE(
        		TIMESTAMP_DIFF(
            			`created_at`,
            			LAG(`created_at`) OVER(PARTITION BY {{ validio_window_id(`created_at`) }} ORDER BY `created_at`),
            			SECOND
        		),
    		0) AS `duration`
	FROM
    		`project`.`dataset`.`table`
	WHERE
    		{{ validio_window_filter(`created_at`) }}
)
GROUP BY
	`validio_window_id`

For the following example data, the result is: window 10:00 = 9s, window 11:00 = 5s.

traffic_web.created_at
10:00:00
10:00:01
11:00:10
11:00:10
11:00:15