Getting started
Set up your first data validation pipeline in 5 minutes
1. Install Validio
Validio can be installed either in your VPC, or you can choose to host it in with us - whatever floats your boat! Read more about our Customer Virtual Private Cloud and our Managed Solution here.
Validio will soon be publicly available on GCP and AWS marketplace. In the meantime, please reach out to us at [email protected] if you want to have a demo and learn more about key features relevant to your data stack!
2. Connect Validio to your data
Once you have Validio running in your environment, the first thing you want to do is connect it to your data. This is achieved by creating a Source Connector.
Connecting a source to Validio is done in minutes using the configuration wizard.
Finish the set-up before starting the source connector
To avoid pre-mature ingestion of data, a source connector needs to be started. Make sure you set up your pipeline, monitor and filters and alerts before you start the source connector!

3. Create your first monitor
Monitors live in Dataset pipelines, so your first order of business is to create the pipeline.
3.1 Create a dataset pipeline
To monitor aggregate metrics, e.g. mean, you have to define a dataset or batch to calculate the mean over. The dataset pipeline wizard will help you set this up, including any partitions you may want to have (doing separate validation on subsets of the data, e.g. by segmenting on a country column). Partitions help you compare apples to apples in your data by grouping data by one or more features, e.g. looking at price data partitioned by product category and currency.
3.2 Create a monitor on the pipeline
To define what metric on which feature we want to validate, in this case the mean of age, we create a monitor on the pipeline.
4. Set up your first alert
Next, you will need to define what values of the mean that should be considered data quality failures. You can either define your own rules or let Validio help you set up smart alerts. Whatever approach you decide on, you do this by setting up an alert. In this case we set up a manual alert for when the mean of age surpasses 35.
5. Start your data validation pipeline
To avoid premature ingestion of data into the Validio platform before you’ve set up all the monitors and alerts you want, by default, you will have to start your source connector.
6. All set!
Congratulations, you have set up your first data validation pipeline! In a real-world setting you would want to set up a notification rule and attach it to a pipeline when configuring a dataset- or datapoint pipeline, to get an email or a slack notification when an alert is triggered.
What's next? Feel free to:
- Explore the user interface
- Explore datapoint pipelines and filters
- Explore one of our most beloved and powerful features: dataset partitioning
Updated 4 months ago