Pipeline partitioning refers to dividing your data into partitions allowing you to monitor and validate metrics on subsets of your data.
Data is partitioned by categorical feature values:
You can also partition by multiple features creating multi-feature partitions:
Validio supports almost as many features as you want, only limited by the number of partitions for technical reasons, driven by number of features and cardinality of the features:
#Partitions = [#distinct categorical values in feature 1] x [#distinct categorical values in feature 2] … x [#distinct categorical values in feature N]
Note that this is the upper bound of the number of partitions. The number of partitions can of course be less in case there aren't any records of a specific combination of feature values present in the data.
Validio currently has customer production deployments with over tens of thousands of partitions and the number of partitions supported is continuously increasing.
Information loss occurs when aggregating data, conversely, more granular analysis can be done when partitioning data.
For instance, let’s say there's a retail organization that works with customers across the globe. They may want to monitor the price data to ensure that things are properly priced. In one column is the feature “price” and in another, there’s “currency”. To monitor just the price column as a whole makes very little sense, since the prices have different orders of magnitude (due to differences in currency). Before performing a data quality validation to make sure there are no anomalies, the data must be partitioned based on currency. Just think about the difference in the order of magnitude if the very same price for a specific item is expressed in USD versus Iranian Rial, where the conversion rate is ~ 1 USD = ~ 40 000 Iranian Rial. If partitioning isn’t done, we’d literally be comparing apples with cars
Partition views can be found on the Monitor and Filter dashboard where you can easily toggle between the different partitions you’ve created in the pipeline set-up.
Updated 4 months ago