Getting started
Set up your first data validation in 5 minutes.
1. Install Validio
You can install Validio either in your VPC, or host it with us - whatever you prefer and need. For more information, read about our Customer Virtual Private Cloud and our Managed Solution.
We plan to make Validio available on GCP and AWS marketplace soon. In the meantime, you can reach out to us at [email protected] to get a demo session and learn more about key features relevant to your data stack!
2. Connect Validio to your data
Once you have Validio running in your environment, you need to connect it to your source.
Finish the set-up before you start your source
Configure your Validators before you start your Source in Validio, to avoid premature reading of data.
- Click on + New Source to start the Source configuration wizard.
In this example, we create a Validio Demo Source Source type.

Our Validio Demo Source with credentials and Source name "Demo source connector".

The schema we want to infer from our Source.
Credential and configuration parameters look different depending on the Source
For more information, refer to Credentials and Sources.
2.1 Create a Window
Configure Windows to define a window (batch) in which data is validated in your Source. For example, what windows of datapoints to monitor and validate a mean on.
- Select which Window type you want to create in the Source configuration wizard.
In this example, we select the Window type Fixed batch window and specify the following:
- Name (of the window):
fixed-batch-150
- Data-time field:
event_time
- Fixed batch size:
150
This triggers a window every 150
datapoints. The datapoints are ordered by the event_time
field.

A fixed batch window on the event_time
data-time field with batch size 150
.
3. Create Segmentation
Configure Segmentation to define segments in your source to validate metrics on. By default, Validio uses the unsegmented
setting. For example, if we create Segmentation on Country
, the metrics for Country = USA
are validated independently from the metrics for Country = Sweden
.
- Click on + New Segmentations to start the Segmentation configuration wizard.
In this example, we create Segmentation on the field Gender
, which in this case contains either the value Male
or Female
.

A Segmentation created on the field Gender
.
4. Create a Validator
Configure a Validator to define metrics to validate on specified fields in your source. When you create a Validator you must use a Window. You can also use a created Segmentation and add filters to specify the behavior of the Validator.
- Click on + New Validator to start the Validator configuration wizard.
In this example, we create a Numeric Validator with the following settings:
Config:
- Metric:
Mean
- Source field(s):
Age
andYearly_wage_USD
- We do not use the
Initialize with Backfill
option
Source config:
- Segmentation =
Gender
- Window =
Fixed-batch-150
Filter:
- Filter type =
Threshold filter
- Field:
Working_hours_weekly
- Operator: equal, Value =
40
This Validator calculates the mean of both Age
and Yearly_wage_USD
, segmented by Gender
, after reading150
datapoints.
Our Threshold filter only includes rows where an individual is working exactly 40
hours per week. All other rows are excluded from the validation.

Validator type configuration.

A Numeric Validator calculating Mean using Source field(s) Age
and Yearly_wage_USD
, with a Threshold filter on Working_hours_weekly
where Value operator is Equal
Value 40
.
Metrics in our example
The Validator in our example yields four mean metrics:
- Mean of
Age
forMale
, working exactly40
hours per week. - Mean of
Age
forFemale
, working exactly40
hours per week. - Mean of
Yearly_wage_USD
forMale
, working exactly40
hours per week. - Mean of
Yearly_wage_USD
forFemale
, working exactly40
hours per week.
4.1 Threshold
You can set up a Threshold when configuring your Validator. A Threshold identify what values of the metric are considered data quality incidents.
Define your own threshold or let Validio do it for you
You can either define your own rules or let Validio help you set up Dynamic thresholds, by setting up a threshold.
In this example, we configure the following Threshold:
- Threshold type:
Dynamic threshold
- Sensitivity:
2
- Decision bounds type:
Upper and lower
This Threshold is applied on the four mean metrics in our example. For each of the metrics, dynamic bounds are calculated based on historical data, and identifies unusually large or small means as incidents.

A Dynamic threshold configured with Sensitivity 2
where the Decision bound type is set to Upper and lower
.
5. Begin your data validation
You can now start your Source to begin the data validation:
- Click Start to start the Demo source connector.

Start your source from the Source details page.
- Alternatively, navigate to Sources and start the Source connector from the action menu.

Start your Source from the Sources overview page.
Backfill option
Use the Backfill option if you want to read all available historical data in your validation. Otherwise, Validio only reads data available after Source start.
- Click on a specific Validator in the Source details page to view details and graphs:

Validator details for "Mean
of Age
where Working_hours_weekly
equal 40
, segmented on Gender
"
6. Done
Congratulations, you have now set up your first data validation!
What's next? Explore:
- The Validio platform user interface .
- How to set up a Notification rule for a Validator and get notified about incidents.
- More details on Segmentation.
Updated about 8 hours ago