About Filters
Use filters to only include datapoints which fulfills a specified filter in the metric calculation.
Filters | Description |
---|---|
No Filter | No filter applied to data. Process all available datapoint in a Window. |
Boolean Filter | Ensures that a Boolean value is either true or false. |
Enum Filter | Ensures that a string only contains values from a fixed set of specified values. |
Null Filter | Ensures that a field value is NULL. |
String Filter | Ensures that a string adheres to certain criteria. |
Threshold filter | Ensures that a numeric value is either under or over a certain threshold. |
SQL Filter | Filter records based on a SQL WHERE statement. |
For example, to count only the NULL
values for a certain field, use a Volume Validator and apply a NULL
filter.
Boolean Filter
Filter records based on a boolean field.
Parameter name | Parameter values |
---|---|
Filter type | Boolean filter |
Field | List of source fields with Boolean data types |
Operator | Is true Is false |
Enum Filter
Filter records based on whether field values are part of a defined set.
Parameter name | Parameter values |
---|---|
Field | List of source fields with string data types |
Operator | Is in Is not in |
Value | List of strings |
Null Filter
Filters records on whether a field values are NULL.
Parameter name | Parameter value |
---|---|
Field | List of source fields |
Operator | Is Is not |
String Filter
Filter records on whether string field values match specific criteria.
Parameter name | Parameter values |
---|---|
Field | List of source fields with string data types |
Operator | Is empty Is not empty Contains Does not contain Starts with Ends with Is exactly Regex |
Value | String |
Caution
Depending on source system, Validio leverages different regex syntax:
- For Data Warehouses and Query Engines, regex filters are applied in the source system. For more information on that syntax, refer to the regex syntax of your specific source system.
- For Object Storages and Data Streams, the regex syntax is compatible with the
Rust regex library
. For information on that syntax, refer to Rust - regex crate syntax.
Threshold Filter
Filter records on whether numeric values comply to a threshold.
Parameter name | Parameter values |
---|---|
Field | List of source fields with numeric data types |
Operator (*) | Equal to Not equal to Greater than Greater than or equal to Less than Less than or equal to |
Value | Float number |
* Specifies the 'allowed region'. For example, Greater than or equal than 5
specifies the allowed region, and all values strictly less than five are be filtered.
SQL Filter
Note
SQL filter works for all Source types, including non-SQL data sources.
You define a SQL filter by typing a valid SQL WHERE
statement in the text box, but omitting the word "WHERE". An example of a valid SQL filter is: some_column = 5
.
Different data sources may use different SQL syntax. For more information, refer to the documentation of your data source. For non-SQL data sources, such as Data Streams or Object Storages, you find the supported functionality and syntax in the SQL filter expression reference.
SQL Filter Recipes
Below are several useful recipes for common SQL filters. Each recipe is written in a generic syntax, or in syntax specific for BigQuery or Snowflake, when required.
You can use these recipes as a starting point when writing SQL filters for other data sources, such as Redshift or Athena.
Filter operators
For example, some_column
matches a filter with several operators.
some_column IS NULL OR (some_column > 3 AND some_column NOT IN(4,6))
Exists in set
For example, some_column
match either of the values 'some_value'
or 'some_other_value'
.
some_column IN ( 'some_value', 'some_other_value' )
Missing values
For example, some_column
is null or matches a value that represents a missing value.
some_column IS NULL OR LOWER(some_column) IN('', '0', '-', 'null')
Semi-structured data
For example, some_array
contains at least one element with value 2.
2 IN(SELECT * FROM UNNEST(some_array))
array_contains(2, some_array)
String pattern
For example, some_column
does not contain a valid URL.
NOT REGEXP_CONTAINS(some_column, '^(http(s)?:\\/\\/.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)')
NOT REGEXP_LIKE(some_column, '^(http(s)?:\\/\\/.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)')
Updated about 1 month ago