Validator Types

Overview of supported Validator types used for calculating metrics.

Validator type options when creating a new validator on a source

Validio supports different types of validators for different monitoring use cases, such as pipeline health, data consistency, completeness, and so on. You can also create a validator using SQL queries to monitor custom metrics. This guide lists the different validator types and supported metrics for each validator.

📘

Validator Metrics and Windows: Validators calculate metrics over a window. For example, a validator calculates the mean value over a daily window, and then validates if these daily mean values follow an expected seasonal pattern. See About Validators and Configuring Validators.

Pipeline Health

Evaluate data pipeline reliability to identify issues in ingestion or processing by monitoring row counts and freshness.

Validator TypeDescriptionMetric Options
FreshnessEnsure data is timely by checking if its timestamp is within the expected range.Freshness
Row CountVerify the number of rows in a table meets expected thresholds.Count
Freshness (metadata)Ensure data is timely by checking if the table was last updated within the expected time range. This check is based on the warehouse metadata.Freshness
Row Count (metadata)Ensure the row count in the table is within the expected range. This check is based on the warehouse metadata.Count
❗️

Metadata Validators for Data Warehouse Sources Metadata validators, Freshness (metadata) and Row Count (metadata), are only available for BigQuery and Snowflake sources.

Uniqueness

Maintain quality standards by checking for duplicate or distinct values in specific fields.

Validator TypeDescriptionMetric Options
Distinct ValuesEnsure a column contains only distinct values or matches expected uniqueness.Unique Count, Unique Percentage
Duplicate ValuesIdentify duplicate entries in a column to maintain data integrity.Duplicate Count, Duplicate Percentage

Completeness

Ensure datasets meet completeness requirements by checking for null values, empty strings, or missing data.

Validator TypeDescriptionMetric Options
Null ValuesCheck for null values to ensure data completeness and reliability.Count, Percentage
Empty StringsCheck a specific field for empty string values to maintain validity.Count, Percentage
Enum ValuesEnsure a field matches a predefined set of allowed values.Count, Percentage

Metrics & Validity

Evaluate numeric and categorical data to verify expected patterns using metrics such as minimum, maximum, mean, and distribution shift.

Validator TypeDescriptionMetric Options
Numeric StatisticsCheck a numeric field against statistical metrics.Mean, Maximum, Minimum, Standard deviation, Sum
Numeric DistributionCompare two datasets to check if a numeric field’s values match the expected distribution.Relative entropy, Mean ratio, Maximum ratio, Minimum ratio, Standard deviation ratio
Categorical DistributionCompare two datasets to check if a categorical field's values match expected proportions.Categories added, Categories removed, Categories changed, Relative entropy
VolumeCheck data volume metrics like count, percentage, duplicates, or distinct values.Count, Percentage, Duplicate count, Duplicate percentage, Unique count, Unique percentage
Relative TimeCompare the time difference between two data subsets.Minimum difference, Maximum difference, Mean difference
Relative VolumeCompare the volume between two data subsets.Count ratio, Percentage ratio

Custom

Define and validate custom metrics.

Validator TypeDescriptionMetric Options
Custom SQLUse SQL queries for tailored validation. Write your own query or describe what you want and let AI generate it.Custom

Reference Source Validation

Validators are either single source or reference. Single source validators calculate metrics based on one dataset, while reference source validators calculate metrics based on fields from two different datasets (sources). You can only configure a reference source for specific validator types that support comparative analysis (such as Numeric Distribution, Relative Volume, and Categorical Distribution).

Reference source validators only calculate metrics if there is data in the target dataset. For example, a Categories removed validator where the reference dataset has 4 categories and the target dataset has 3 categories, yields a result of 1. If the target dataset has 0 categories, the validator does not return any result, because the target dataset has no data to calculate metrics on.

For more information and configuration examples, see Reference Source Validation.