About Segmentation

Validate your data per segments, to detect and resolve issues deep inside of your data.

In Validio, you can configure Segmentation to validate metrics on segments of your data. For example, if a Segmentation is specified for the field Marital status, metrics are validated for each distinct value within that field.

1340

Segmentation allows you to validate data within segments of your data.

👍

Segmentation works similar to GROUP BY in SQL.

Multi-field segments

You can create Segmentations on multiple fields. For example, if a Segmentation is specified for the fields Country, Gender, and Marital status, the metric average Annual salary is validated for each combination of distinct values within Country, Gender, and Marital status.

Validio has customers that use thousands of segments, and is continuously increasing the number of supported segments. However, it is important to consider that a very large number of segments might have a performance impact.

In this example we create segmentation on the data using the three fields: `Country`, `Gender` and `Marital status`, tracking median `Annual salary` for each of the segments.

Example Segmentation on the three fields: Country, Gender, and Marital status, tracking average Annual salary for each of the segments.

Why should I use Segmentation?

Information loss occurs when aggregating data. Conversely, by segmenting the data, a more granular analysis can be performed.

A segmentation example

A retail organization wants to validate their price data, to make sure their products are properly priced. For validating purposes, they want to use the fields price and currency. Because of differences in currency, the prices have different orders of magnitude, which means that only validating datapoints from the price column makes little sense. The data must be segmented based on currency, before performing a data quality validation to make sure there are no anomalies,

Think of the difference in the order of magnitude if the same price for a specific item is expressed in USD versus Iranian Rial, where the conversion rate is ~ 1 USD = ~ 40 000 Iranian Rial.

If no Segmentation is applied to validate price data in currency, the retail organization would be comparing apples with cars.

1340

Example on an anomaly detected in the Currency = IRR segment.