Getting started

Set up your first data validation pipeline in 5 minutes

1. Install Validio

Validio can be installed either in your VPC, or you can choose to host it in with us - whatever floats your boat! Read more about our Customer Virtual Private Cloud and our Managed Solution here.

Validio will soon be publicly available on GCP and AWS marketplace. In the meantime, please reach out to us at [email protected] if you want to have a demo and learn more about key features relevant to your data stack!

2. Connect Validio to your data

Once you have Validio running in your environment, the first thing you want to do is connect it to your data. This is achieved by creating a Source Connector.

Connecting a source to Validio is done in minutes using the configuration wizard.


Finish the set-up before starting the source connector

To avoid pre-mature ingestion of data, a source connector needs to be started. Make sure you set up your pipeline, monitor and filters and alerts before you start the source connector!


3. Create your first monitor

Monitors live in Dataset pipelines, so your first order of business is to create the pipeline.

3.1 Create a dataset pipeline

To monitor aggregate metrics, e.g. mean, you have to define a dataset or batch to calculate the mean over. The dataset pipeline wizard will help you set this up, including any partitions you may want to have (doing separate validation on subsets of the data, e.g. by segmenting on a country column). Partitions help you compare apples to apples in your data by grouping data by one or more features, e.g. looking at price data partitioned by product category and currency.

3.2 Create a monitor on the pipeline

To define what metric on which feature we want to validate, in this case the mean of age, we create a monitor on the pipeline.

4. Set up your first alert

Next, you will need to define what values of the mean that should be considered data quality failures. You can either define your own rules or let Validio help you set up smart alerts. Whatever approach you decide on, you do this by setting up an alert. In this case we set up a manual alert for when the mean of age surpasses 35.

5. Start your data validation pipeline

To avoid premature ingestion of data into the Validio platform before you’ve set up all the monitors and alerts you want, by default, you will have to start your source connector.

6. All set!

Congratulations, you have set up your first data validation pipeline! In a real-world setting you would want to set up a notification rule and attach it to a pipeline when configuring a dataset- or datapoint pipeline, to get an email or a slack notification when an alert is triggered.

What's next? Feel free to: