About Custom SQL Sources
For SQL-based sources such as data warehouses and query engines, you can configure the dataset to monitor by writing custom SQL queries directly in the source configuration. These custom SQL sources behave like any other source in Validio, while also expanding the scope of your data monitoring with support for dimension mapping and better root cause analysis.
When configuring a custom SQL validator on a SQL source, the SQL query must include the source table
template function. For more information, see Custom SQL Template Functions.
Supported Source Types
The following data sources support using custom SQL. For other sources, see About Sources.
Use Cases for Custom SQL Sources
The following are some use cases for why you would choose to configure your sources with custom SQL.
Monitoring Unstructured Data
Custom SQL sources are a great place to utilize LLMs in your warehouse for validating unstructured data. For example, take advantage of Snowflake Cortex, Google BigQuery's Gemini or Databricks AI Functions to validate misspelled or inconsistent naming for data in a column. Read more in our blog post, "Bringing data quality to unstructured data with LLMs and Custom SQL".
Reused Shared SQL Code
You may find yourself replicating the same logic across multiple custom SQL validators. If that is the case, you can isolate that code in a custom SQL source. These can include code for casting strings to timestamps repeatedly, as well as more complicated logic transformation logic. An added benefit of moving this shared logic to custom SQL sources is that you might be able to take advantage of out-of-the-box validators, leading to improved debug queries and RCA workflows.
Fact and Dimension Table Architectures
In many cases you are monitoring a fact table but need to include information about particular dimensions. To support these use cases, you can create a SQL source to JOIN a fact and dimension table and then group rows by a dimension attribute. Then, you can configure a validator to monitor anomalies in that grouping.
Dynamic Segment Creation
You might have a column that you want to compute further aggregations on. For example, you can convert numeric ages into a categorical dimension of age groups; or take a date and break it down into individual components of year, month, and day. Doing this will then allow you to take advantage of all of our downstream segmentation functionality for these dynamically created categorical dimensions.
Creating "Views" within Validio
You might not have editor access for creating views outside of Validio in your warehouse or ETL tool. However, you can still create custom SQL sources inside of Validio without the need to expand access controls outside of Validio.
Precomputing Segment Frequencies
For more intricate segment filtering, you can precompute segment frequencies or counts as part of a custom SQL source and then use these frequencies for filtering within Validio.
Updated 1 day ago