HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

About Filters

Use filters to specify the datapoints to include in the metric calculation for your validator. You can define filters for segmentations on a validator. The following table lists the types of filters that Validio supports:

FiltersDescription
No FilterNo filter applied to data. Process all available datapoint in a Window.
Boolean FilterEnsures that a Boolean value is either true or false.
Enum FilterEnsures that a string only contains values from a fixed set of specified values.
Null FilterEnsures that a field value is NULL.
String FilterEnsures that a string adheres to certain criteria.
Threshold filterEnsures that a numeric value is either under or over a certain threshold.
SQL FilterFilter records based on a SQL WHERE statement.

For example, to count only the NULL values for a certain field, use a Volume Validator and apply a NULL filter.

📘

Note

A filter applied on a segmentation will be true for all validators that use the same segmentation.


Boolean Filter

Filter records based on a boolean field.

Parameter nameParameter values
Filter typeBoolean filter
FieldList of source fields with Boolean data types
OperatorIs true
Is false

Enum Filter

Filter records based on whether field values are part of a defined set.

Parameter nameParameter values
Field List of source fields with string data types
OperatorIs in
Is not in
ValueList of strings

Null Filter

Filters records on whether a field values are NULL.

Parameter nameParameter value
FieldList of source fields
OperatorIs
Is not

String Filter

Filter records on whether string field values match specific criteria.

Parameter nameParameter values
FieldList of source fields with string data types
OperatorIs empty
Is not empty
Contains
Does not contain
Starts with
Ends with
Is exactly
Regex
ValueString

🚧

Caution

Depending on source system, Validio leverages different regex syntax:

  • For Data Warehouses and Query Engines, regex filters are applied in the source system. For more information on that syntax, refer to the regex syntax of your specific source system.
  • For Object Storages and Data Streams, the regex syntax is compatible with the Rust regex library. For information on that syntax, refer to Rust - regex crate syntax.

Threshold Filter

Filter records on whether numeric values comply to a threshold.

Parameter nameParameter values
FieldList of source fields with numeric data types
Operator (*)Equal to
Not equal to
Greater than
Greater than or equal to
Less than
Less than or equal to
ValueFloat number

* Specifies the 'allowed region'. For example, Greater than or equal than 5 specifies the allowed region, and all values strictly less than five are be filtered.

SQL Filter

📘

Note

SQL filter works for all Source types, including non-SQL data sources.

You define a SQL filter by typing a valid SQL WHERE statement in the text box, but omitting the word "WHERE". An example of a valid SQL filter is: some_column = 5.

Different data sources may use different SQL syntax. For more information, refer to the documentation of your data source. For non-SQL data sources, such as Data Streams or Object Storages, you find the supported functionality and syntax in the SQL filter expression reference.

SQL Filter Recipes

Below are several useful recipes for common SQL filters. Each recipe is written in a generic syntax, or in syntax specific for BigQuery or Snowflake, when required.

You can use these recipes as a starting point when writing SQL filters for other data sources, such as Redshift or Athena.

Filter operators

For example, some_column matches a filter with several operators.

some_column IS NULL OR (some_column > 3 AND some_column NOT IN(4,6))

Exists in set

For example, some_column match either of the values 'some_value' or 'some_other_value'.

some_column IN ( 'some_value', 'some_other_value' )

Missing values

For example, some_column is null or matches a value that represents a missing value.

some_column IS NULL OR LOWER(some_column) IN('', '0', '-', 'null')

Semi-structured data

For example, some_array contains at least one element with value 2.

2 IN(SELECT * FROM UNNEST(some_array))
array_contains(2, some_array)

String pattern

For example, some_column does not contain a valid URL.

NOT REGEXP_CONTAINS(some_column, '^(http(s)?:\\/\\/.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)')
NOT REGEXP_LIKE(some_column, '^(http(s)?:\\/\\/.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)')