HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Filters

Use filters to only include datapoints which fulfills a specified filter in the metric calculation.

FiltersDescription
No FilterNo filter applied to data. Process all available datapoint in a Window.
Boolean FilterEnsures that a Boolean value is either true or false.
Enum FilterEnsures that a string only contains values from a fixed set of specified values.
Null FilterEnsures that a field value is NULL.
String FilterEnsures that a string adheres to certain criteria.
Threshold filterEnsures that a numeric value is either under or over a certain threshold.
SQL FilterFilter records based on a SQL WHERE statement.

👍

Example

Want to count only the NULL values for a certain field? Then you can use a Volume Validator and apply a NULL filter.


Boolean Filter

Filter records based on a boolean field.

Parameter name

Parameter values

Filter type

Boolean filter

Field

List of source fields with Boolean data types

Operator

Is true
Is false

Enum Filter

Filter records based on whether field values are part of a defined set.

Parameter name

Parameter values

Field

List of source fields with string data types

Operator

Is in
Is not in

Value

List of strings

Null Filter

Filters records on whether a field values are NULL.

Parameter name

Parameter value

Field

List of source fields

Operator

Is
Is not

String Filter

Filter records on whether string field values match specific criteria.

Parameter name

Parameter values

Field

List of source fields with string data types

Operator

Is empty
Is not empty
Contains
Does not contain
Starts with
Ends with
Is exactly
Regex

Value

String

🚧

Caution

Depending on source system, Validio leverages different regex syntax:

  • For Data Warehouses and Query Engines, regex filters are applied in the source system. For more information on that syntax, refer to the regex syntax of your specific source system.
  • For Object Storages and Data Streams, the regex syntax is compatible with the Rust regex library. For information on that syntax, refer to Rust - regex crate syntax.

Threshold Filter

Filter records on whether numeric values comply to a threshold.

Parameter name

Parameter values

Field

List of source fields with numeric data types

Operator (*)

Equal to
Not equal to
Greater than
Greater than or equal to
Less than
Less than or equal to

Value

Float number

*Specifies the 'allowed region'. For example,Greater than or equal than 5 specifies the allowed region, and all values strictly less than five are be filtered.

SQL Filter

👍

Available across sources

SQL filter works for all Source types, including non-SQL data sources.

You define a SQL filter by typing a valid SQL WHERE statement in the text box, but omitting the word "WHERE". An example of a valid SQL filter is: some_column = 5.

Different data sources may use different SQL syntax. For more information, refer to the documentation of your data source. For non-SQL data sources, such as Data Streams or Object Storages, you find the supported functionality and syntax in the SQL filter expression reference.

SQL Filter Recipes

Below are several useful recipes for common SQL filters. Each recipe is written in a generic syntax, or in syntax specific for BigQuery or Snowflake, when required.

You can use these recipes as a starting point when writing SQL filters for other data sources, such as Redshift or Athena.

Filter operators

For example,some_column matches a filter with several operators.

some_column IS NULL OR (some_column > 3 AND some_column NOT IN(4,6))

Exists in set

For example,some_column match either of the values 'some_value' or 'some_other_value'.

some_column IN ( 'some_value', 'some_other_value' )

Missing values

For example,some_column is null or matches a value that represents a missing value.

some_column IS NULL OR LOWER(some_column) IN('', '0', '-', 'null')

Semi-structured data

For example,some_array contains at least one element with value 2.

2 IN(SELECT * FROM UNNEST(some_array))
array_contains(2, some_array)

String pattern

For example,some_column does not contain a valid URL.

NOT REGEXP_CONTAINS(some_column, '^(http(s)?:\\/\\/.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)')
NOT REGEXP_LIKE(some_column, '^(http(s)?:\\/\\/.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)')