About Segmentation
Validate your data per segments, to detect and resolve issues deep inside of your data.
In Validio, you can configure Segmentation to validate metrics on segments of your data. For example, if a Segmentation is specified for the field Marital status
, metrics are validated for each distinct value within that field.
Segmentation works similar to
GROUP BY
in SQL.
Multi-field segments
You can create Segmentations on multiple fields. For example, if a Segmentation is specified for the fields Country
, Gender
, and Marital status
, the metric average Annual salary
is validated for each combination of distinct values within Country
, Gender
, and Marital status
.
Validio has customers that use thousands of segments, and is continuously increasing the number of supported segments. However, it is important to consider that a very large number of segments might have a performance impact.
Why should I use Segmentation?
Information loss occurs when aggregating data. Conversely, by segmenting the data, a more granular analysis can be performed.
A segmentation example
A retail organization wants to validate their price data, to make sure their products are properly priced. For validating purposes, they want to use the fields price
and currency
. Because of differences in currency, the prices have different orders of magnitude, which means that only validating datapoints from the price
column makes little sense. The data must be segmented based on currency, before performing a data quality validation to make sure there are no anomalies,
Think of the difference in the order of magnitude if the same price for a specific item is expressed in USD versus Iranian Rial, where the conversion rate is ~ 1 USD = ~ 40 000 Iranian Rial.
If no Segmentation is applied to validate price data in currency
, the retail organization would be comparing apples with cars.
Updated 10 months ago