HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Custom SQL

For data from data warehouse or query engine sources, you can leverage SQL to create validators customized for your business needs. For example, you can use a custom SQL validator to count duplicates of values or records, compare fields or records between tables, apply custom filters to your data, and more.

📘

Note

This article discusses the updated Custom SQL validator configuration and syntax. For the previous version of the Custom SQL validator, refer to Custom SQL (deprecated). For migration guidelines between the syntax versions, see Migrating Custom SQL Syntax.

Configuring a Custom SQL Validator

To configure a Custom SQL Validator,

  1. Navigate to the specific source where you want to add the validator.
  2. Click + New Validator to start the configuration wizard.
  3. Under Source Type, select Custom SQL.
  4. Under Segmentation, segment your data into subsets for more granular validation. Check one of the following options:
    1. Existing Segmentation–to select a pre-configured segmentation from the list.
    2. Create Segmentation–to select fields to include in the new segmentation.
  5. Under Window, specify the amount of data or time over which the validator operates. Check one of the following options:
    1. Existing Window–to select a pre-configured window from the list.
    2. Create Window–to configure a new window, which can be a Global window or Tumbling window. You will see different configuration parameters depending on the window type. For more information, see About Windows.
  6. Under SQL Query,
    1. Enter a valid SQL expression. The SQL expression must return a field named validio_metric and columns that match the segmentation fields configured for the validator. If you are using a tumbling window, the SQL expression can use Template Functions and must also return a field named validio_window_id. For more information, see Mandatory Fields, Segmentation Fields, and Template Functions.
    2. (Recommended) Click Test Query to verify that your SQL expression is valid. For more information, see Testing the SQL Query.
  7. Under Config, check Initialize with Backfill if you want to include historical data for validation.
  8. Under Thresholds, select the type of threshold to use to detect incidents. You can configure a Fixed threshold, Dynamic threshold, or a Difference threshold. For more information, see About Thresholds.
  9. Enter a Validator Name.
  10. Click Continue to create the Custom SQL Validator.

For examples of Custom SQL validator configurations, refer to Custom SQL Validator Recipes.

Testing the SQL Query

You can test the SQL query for your custom validator to verify that your SQL expression is valid:

  • Test Query–(Recommended) Test that your SQL query is valid.
    • If the query executes properly, the query returns a table with the first 20 records.
    • If the query did not execute properly, the query returns an error message propagated from the data source.
  • Skip Test–Continue configuring the validator without testing.

👍

Recommendation

Because syntax errors are common in custom SQL, we recommend always testing the query so that you can discover and correct errors early, instead of during source polling.

Mandatory Fields

The SQL query must return the mandatory fields validio_metric and, if the validator uses a tumbling window, validio_window_id.

📘

Note

Thevalidio_metric and validio_window_id fields should be written in lower case, which requires using quotation marks for some data sources, such as double quotation marks for Snowflake.

Validation Metric

The Custom SQL query must return a field named validio_metric to instruct Validio which field to use for validation.

Validation Window

You can configure a Custom SQL validator to use either Tumbling or Global windows.

  • With Tumbling windows, you can write the SQL query using Validio template functions The SQL query must return the validio_metric field as well as a field named validio_window_id. This ID is used to map the metrics for each tumbling window to their respective time ranges. For more information, see Template Functions.
  • With Global windows, you can use regular SQL syntax (which must return the validio_metric field).

📘

Note

A tumbling window uses a cursor field to keep track of what metrics have already been processed and what has not. For more information, see Tumbling Windows.

Segmentation Fields

The Custom SQL query must return columns that match the Segmentation fields configured for the validator. For example, if segmentation on Country is used, then the SQL query should return a column called Country. All returned segmentation columns must be of type string, and if necessary you can use casting to ensure the type. For multi-field segmentations, each field is displayed as a separate column.

Aliases for the returned segmentation columns must follow the following naming conventions:

Naming ConventionsExample of Segmentation Field → Column Name
Match the name and case of the underlying column. For some data sources, you will need to use quotation marks around the column name.country → country
ZipCode → ZipCode
Non-alphanumeric characters should be replaced with an underscore.foo:bar.baz → foo_bar_baz
For semi-structured segmentations using arrays, the array access should be replaced with underscore followed by the index.foo:bar.baz[0].qux → foo_bar_baz_0_qux
If the segmentation is using the array length field, the alias should be suffixed with _length.foo:bar.baz[0].qux.length() → foo_bar_baz_0_qux_length

Template Functions

You can write your custom SQL Query to use placeholders and template functions to reference Source, Segmentations, and Windows to help integrate your custom SQL logic with Validio tumbling windows.

🚧

Important

Template functions are only available if your Custom SQL Validator is configured to use tumbling windows.


PlaceholderDescription
{{ validio_window_id(<identifier>) }}(Required) Used to generate an ID for all data points that belong to the same window. Must be placed in the SELECT clause.

For more information, see Window ID.
{{ validio_window_filter(<identifier>) }}(Required) Used to process only the data that has not already been processed. Must be placed in the WHERE clause.

For more information, see Window Filter .
{{ validio_reference_window_filter(<identifier>, <lookback>) }}(Optional) Used when data in the current window needs to be compared to data in previous windows. Must be placed in the WHERE clause.

For more information, see Reference Window Filter.

Window ID

The required function validio_window_id(<identifier>) is used to generate an ID for all data points that belong to the same window. The function must be placed in the SELECT clause.

The function expands to a formula that calculates which window the datetime in <identifier> belongs to. Then it counts the number of windows the calculated window is from a reference timestamp to get a monotonically increasing numerical ID for each window in the query. (The reference timestamp is often the timestamp for the oldest data point for the current poll.)

This column must be included in the GROUP BY clause to aggregate over all data points in the same window. Because it is used to link metrics to the correct window time ranges, the column must be returned as the name validio_window_id (matching the case).

Window Filter

The required function validio_window_filter(<identifier>) expands to a filter condition that limits the query to the active time range. For example, during a backfill poll the function will expand to a time range that covers the lookback period. During a normal poll the function will span back to where the last poll completed, which will be the time range for the latest window (or windows). The <identifier> should be the timestamp column, which is usually the cursor column.

The function expands to the following filter condition:

<identifier> >= <first_window_start_timestamp> AND
<identifier < <last_window_end_timestamp>

This function must be used on the source table, but can also be used on other tables, such as when comparing data from two different tables. (You can see this demonstrated in the Tumbling Reference Count recipe in Custom SQL Validator Recipes.) The window filter function works as an optimization to process only the data that is relevant for the current poll.

Reference Window Filter

The optional function validio_reference_window_filter(<identifier>, <lookback>) is useful when doing reference validation, where data in the current window needs to be compared to data in previous windows. This function is similar to the Window Filter function, but takes a second argument that can be used to extend the time range with a <lookback> amount of windows.

The function expands to the following filter condition:

<identifier> >= <first_window_start_timestamp - lookback windows> AND
<identifier < <last_window_end_timestamp>

You can see an example of a reference window filter function in the Segmented Tumbling Mean Ratio recipe in Custom SQL Validator Recipes.

Custom SQL Query Example

This example demonstrates how to use template functions to build a custom SQL query. The validator is configured with a daily tumbling window and segmentation on country.

The following custom SQL query uses template functions:

SELECT
	`country`,
	COUNT(*) AS `validio_metric`,
	{{ validio_window_id(`created_at`) }} AS `validio_window_id`
FROM
	`project`.`dataset`.`table`
WHERE
	{{ validio_window_filter(`created_at`) }}
GROUP BY
	`country`,
	-- Note: not all warehouses support grouping by alias like this.
	-- In those cases the template function can be used here as well.
	`validio_window_id`

The following shows the full SQL query after the template functions have been expanded:

SELECT
	`country`,
	COUNT(*) AS `validio_metric`,
	-- Expanded {{ validio_window_id(`created_at`) }}
	FLOOR(
	    	CAST(
       	  	TIMESTAMP_DIFF(
            			`created_at`,
            			'2024-02-26T00:00:00Z', —- Reference timestamp
            			SECOND
        		) AS FLOAT64
    		) / 86400
	) AS `validio_window_id`,
FROM
	`project`.`dataset`.`table`
WHERE
	-- Expanded {{ validio_window_filter(`created_at`) }}
	-- In this example it is a backfill poll for a source with 7 days lookback 
	`created_at` >= '2024-02-26T00:00:00Z' AND
	`created_at` < '2024-03-04T00:00:00Z'
GROUP BY
	`country`,
	`validio_window_id`

The following table shows the output of the SQL query:

validio_metricvalidio_window_idcountrywindow_start1window_end1
100SE2024-02-26T00:00:00Z2024-02-27T00:00:00Z
150US2024-02-26T00:00:00Z2024-02-27T00:00:00Z
81SE2024-02-27T00:00:00Z2024-02-28T00:00:00Z
131US2024-02-27T00:00:00Z2024-02-28T00:00:00Z

1 The window_start and window_end columns are shown for illustrative purposes and are not part of the SQL validator query.

For more examples of validator configurations, refer to Custom SQL Validator Recipes.