HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Validator Types

Overview of supported Validator types used for calculating metrics.

Validio supports different types of validators organized by the different use cases you want to monitor, such as pipeline health, data consistency, completeness, and so on. You can also create a validator using SQL queries to monitor custom metrics. This guide lists the different validator types and supported metrics for each validator.

📘

Note

Validators calculate metrics over a window. For example, a validator calculates the mean value over a daily window, and then validates if these daily mean values follow an expected seasonal pattern. The About Validators and Configuring Validators guides explain the concepts related to validators and provide useful context for configuration.

❗️

Important

Metadata validators, Freshness (metadata) and Row Count (metadata), are only available for BigQuery and Snowflake sources.

Pipeline Health

Evaluate data pipeline reliability to identify issues in ingestion or processing by monitoring row counts and freshness.

Validator Type

Description

Metric

Freshness

Ensure data is timely by checking if its timestamp is within the expected range.

Freshness

Row Count

Verify the number of rows in a table meets expected thresholds.

Count

Freshness (metadata)

Ensure data is timely by checking if the table was last updated within the expected time range.

  • *Note:** This check is based on the warehouse metadata.

Freshness

Row Count (metadata)

Ensure the row count in the table is within the expected range.

  • *Note:** This check is based on the warehouse metadata.

Count

Uniqueness

Maintain quality standards by checking for duplicate or distinct values in specific fields.

Validator Type

Description

Metric

Distinct Values

Ensure a column contains only distinct values or matches expected uniqueness.

Unique Count Unique Percentage

Duplicate Values

Identify duplicate entries in a column to maintain data integrity.

Duplicate Count Duplicate Percentage

Completeness

Ensure datasets meet completeness requirements by checking for null values, empty strings, or missing data.

Validator Type

Description

Metric

Null Values

Check for null values to ensure data completeness and reliability.

Count Percentage

Empty Strings

Check a specific field for empty string values to maintain validity.

Count Percentage

Enum Values

Ensure a field matches a predefined set of allowed values.

Count Percentage

Metrics & Validity

Evaluate numeric and categorical data to verify expected patterns using metrics such as minimum, maximum, mean, and distribution shift.

Validator Type

Description

Metric

Numeric Statistics

Check a numeric field against metrics like maximum, minimum, mean, or sum.

Mean Maximum Minimum Standard Deviation Sum

Numeric Distribution

Check if a numeric field’s values match the expected distribution, comparing two datasets.

Relative Entropy Mean Ratio Maximum Ratio Minimum Ratio Standard Deviation Ratio

Categorical Distribution

Ensure a categorical field’s values match expected proportions, comparing two datasets.

Categories Added Categories Removed Categories Changed Relative Entropy

Volume

Check data volume metrics like count, percentage, duplicates, or distinct values.

Count Percentage Duplicate Count Duplicate Percentage Unique Count Unique Percentage

Relative Time

Compare the time difference between two data subsets.

Minimum Difference Maximum Difference Mean Difference

Relative Volume

Compare the volume between two data subsets.

Count Ratio Percentage Ratio

Custom

Define and validate custom metrics.

Validator TypeDescriptionMetric
Custom SQLUse SQL queries for tailored validation of metrics and conditions.Custom

Reference Validators

Validators are either single source or reference. Single source validators calculate metrics based on one dataset, while reference validators calculate metrics based on multiple fields from two different datasets. Reference validators (such as Numeric Distribution, Relative Time, Relative Volume, and Categorical Distribution) only calculate metrics if there is data in the target dataset.

For example, a Categories Removed validator where the reference dataset has 4 categories and the target dataset has 3 categories, yields a result of 1. If the target dataset has 0 categories, the validator does not return any result, because the target dataset has no data to calculate metrics on.