Getting Started with Validio
Set up your first data validation in 5 minutes.
1. Install Validio
You can either install Validio in your VPC or host it with us, depending on your preferences and needs. For more information, read about our Customer virtual private cloud and our Managed solution.
Until Validio is available on GCP and AWS marketplace, you can reach out to us at [email protected]. We provide demo sessions where you can learn more about key features relevant to your data stack!
2. Connect Validio to your data
When Validio is running in your environment, you need to authenticate and connect Validio to read data from your Source.
- Click on + New Source to start the Source configuration wizard.
In this example, we create a Validio Demo Source.
Credential and configuration parameters look different depending on Source type
For more information, refer to Credentials and Sources.
2.1 Create a Window
Configure Windows to define a window (batch) in which data is validated in your Source. For example, what windows of datapoints to monitor and validate a mean on.
- Select which Window type you want to create in the source configuration wizard.
In this example, we select the Window type Fixed batch window and specify the following:
- Name (of the window):
Batches of 256 datapoints on "event_time"
- Data-time field:
event_time
- Fixed batch size:
256
This triggers a window every 256
datapoints. The datapoints are ordered by the event_time
field.
3. Create Segmentation
Configure Segmentation to define segments in your source to validate metrics on. By default, Validio uses the unsegmented
setting. For example, if we create Segmentation on Country
, the metrics for Country = USA
are validated independently from the metrics for Country = Sweden
.
- Click on + New Segmentations to start the Segmentation configuration wizard.
In this example, we create Segmentation on the field Gender
, which in this case contains either the value Male
or Female
.
4. Create a Validator
Configure a Validator to define metrics to validate on specified fields in your source. You can use a created segmentation and window, or add filters, to specify the behavior of the Validator.
- Click on + New Validator to start the Validator configuration wizard.
In this example, we create a Numeric Validator with the following settings:
Config:
- Metric:
Mean
- Source field(s):
Age
- We do not use the
Initialize with Backfill
option
Source config:
- Segmentation =
Gender
- Window =
Batches of 256 datapoints on "event_time"
Filter:
- Filter type =
Threshold filter
- Field:
Working_hours_weekly
- Operator: equal, Value =
40
This Validator calculates the mean of Age
, segmented by Gender
, after reading 256
datapoints.
Our Threshold filter only includes rows where an individual is working exactly 40
hours per week. All other rows are excluded from the validation.
Metrics in our example
The Validator in our example yields two mean metrics:
- Mean of
Age
forMale
, working exactly40
hours per week. - Mean of
Age
forFemale
, working exactly40
hours per week.
4.1 Threshold
You can set up a Threshold when configuring your Validator. A Threshold identify what values of the metric are considered data quality incidents.
Define your own threshold or let Validio do it for you
You can either define your own rules or let Validio help you set up Dynamic thresholds, by setting up a threshold.
In this example, we configure the following Threshold:
- Threshold type:
Dynamic threshold
- Sensitivity:
2
- Decision bounds type:
Upper and lower
This Threshold is applied on the four mean metrics in our example. For each of the metrics, dynamic bounds are calculated based on historical data, and identifies unusually large or small means as incidents.
5. Begin your data validation
You can now start your source to begin the data validation:
- Click Start to start Demo source.
- Alternatively, navigate to Sources and start the Source connector from the action menu.
Backfill option
Use the Backfill option if you want to read all available historical data in your validation. Otherwise, Validio only reads data available after Source start.
- Click on a specific Validator in the Source details page to view details and graphs:
6. Done
Congratulations, you have now set up your first data validation!
WHAT'S NEXT
- The Validio platform user interface.
- How to set up a Notification rule for a Validator and get notified about incidents.
- More details on Segmentation.
Updated 10 months ago