About Custom SQL Sources

Custom SQL sources let you define the dataset to monitor by writing SQL queries directly in the source configuration. Use them to create Common Table Expressions (CTEs) inside Validio for joining tables, assessing cross-asset dependencies, and preprocessing data — without changing your underlying tables or expanding warehouse permissions.

Use Custom SQL to configure the data source

For SQL-based sources such as data warehouses and query engines, you can configure the dataset to monitor by writing custom SQL queries directly in the source configuration. These custom SQL sources behave like any other source in Validio — supporting dimension mapping, root cause analysis, and the full library of out-of-the-box validators — while also expanding the scope of your data monitoring.

Choosing Between a SQL Source and a Custom SQL Validator

Validio's extensive library of out-of-the-box validators covers most data quality rules. When you need to go beyond them, you have two options that can be used independently or together:

  • Custom SQL Source: reshape, join, or enrich the data so an out-of-the-box validator applies. The validator logic stays standard; only the dataset definition is custom.
  • Custom SQL Validator: keep the source as-is and define a bespoke metric calculation in SQL. Use this when the metric itself isn't expressible as an out-of-the-box validator.
NeedCustom SQL SourceCustom SQL Validator
Join two or more tables before validating
Filter or transform a column to fit an existing segmentation
Compare row counts or values across two assets✓ Use when the comparison logic is the metric
Convert a numeric column into categorical groups (for example, age buckets)
Compute an exponentially weighted moving average
Express a custom ratio or business-specific metric

Whenever a Custom SQL Source can do the job, prefer it — out-of-the-box validators come with built-in debug queries and richer root cause analysis workflows that Custom SQL Validators do not.

Supported Source Types

The following data sources support using custom SQL. For other sources, see About Sources.

Use Cases for Custom SQL Sources

The following are some use cases for why you would choose to configure your sources with custom SQL.

Assessing Cross-Asset Dependencies

A cross-asset dependency check evaluates data quality across two or more assets — for example, parity between a source and target table, referential integrity between fact and dimension tables, or row-count reconciliation across a transformation. By defining a custom SQL source that joins the relevant assets, you can express the dependency once and apply any out-of-the-box validator to the result.

For end-to-end examples, see Custom SQL Validator Recipes.

Creating Data Diff Views

Custom SQL sources are well suited for comparing two datasets — for example, before and after a transformation, source against target, or a snapshot against the current state. Define the diff in SQL once, and monitor it with any out-of-the-box validator to catch unexpected changes.

Preprocessing Data Before Validation

Use a custom SQL source to reshape data so that it fits an out-of-the-box validator, instead of relying on a Custom SQL Validator to handle every transformation inline. Preprocessing in the source layer keeps validator logic standard and unlocks Validio's full debug query and root cause analysis support.

Reused Shared SQL Code

You may find yourself replicating the same logic across multiple custom SQL validators. If that is the case, you can isolate that code in a custom SQL source. These can include code for casting strings to timestamps repeatedly, as well as more complicated logic transformation logic. An added benefit of moving this shared logic to custom SQL sources is that you might be able to take advantage of out-of-the-box validators, leading to improved debug queries and RCA workflows.

Dynamic Segment Creation

You might have a column that you want to compute further aggregations on. For example, you can convert numeric ages into a categorical dimension of age groups; or take a date and break it down into individual components of year, month, and day. Doing this will then allow you to take advantage of all of our downstream segmentation functionality for these dynamically created categorical dimensions.

Precomputing Segment Frequencies

For more intricate segment filtering, you can precompute segment frequencies or counts as part of a custom SQL source and then use these frequencies for filtering within Validio.

Fact and Dimension Table Architectures

In many cases you are monitoring a fact table but need to include information about particular dimensions. To support these use cases, you can create a SQL source to JOIN a fact and dimension table and then group rows by a dimension attribute. Then, you can configure a validator to monitor anomalies in that grouping.

Monitoring Unstructured Data

Custom SQL sources are a great place to utilize LLMs in your warehouse for validating unstructured data. For example, take advantage of Snowflake Cortex, Google BigQuery's Gemini or Databricks AI Functions to validate misspelled or inconsistent naming for data in a column. Read more in our blog post, "Bringing data quality to unstructured data with LLMs and Custom SQL".

Creating "Views" within Validio

Custom SQL sources behave like views, but live entirely inside Validio. You don't need create-view permissions in your warehouse, you don't need to ask a platform team to update an ETL job, and the view's lifecycle stays attached to the validator that depends on it. Use them when you'd benefit from a view but can't — or don't want to — create one in the underlying system.