Configuration

A source connector is used to integrate with your data source. Enabling data to be ingested into the platform to be monitored and validated.

The source connector wizard will guide you through setting up and configuring your connector. The wizard consists of three steps: selecting the type of connector, configuring the connector, and defining the schema of the data.

📘

Start the source connector to ingest data

To avoid premature ingestion, source connectors needs to be started before data is ingested. Make sure you have set up your pipelines, monitor and filters and alerts set-up before you start the connector!

1. Selecting type

Validio’s data quality platform supports a wide range of integrations with different cloud providers, data warehouses, streaming technologies and object stores. Select the type of integration that you’d like to use for your source connector.

2. Configuration

Start configuring your source connector by giving it a descriptive name and then continue filling out the required configuration. Depending on the type of connector, different configurations will need to be filled out, refer to the source specific configuration parameters by navigating the Source Connector pages to the left.

While configuring your connector you have the option to also create a metadata connector. A Metadata Connector allows you to monitor metadata about the source, such as freshness, row count and ingestion rates.

3. Defining schema

3.1 Automatic schema inference

In order for the platform to know how to validate the data and the data type of the features, a schema must be defined. To simplify this process the platform will first attempt to infer the schema from your data. Depending on the type of connector this might take a few seconds.

🚧

For automatic schema inference to work, make sure there is actual data in the source - e.g. that a stream isn't empty and that a file in an object store contains data

3.2 Schema inference using a JSON file

If the inference is successful a suggested schema will be presented in the wizard. If the inference fails, you will be asked to upload a JSON-file with some sample data, the wizard will then try to infer the schema from the file instead based on majority vote in each feature. Make sure that the inferred data types match what the actual data look like.

📘

Validio supporting un-nested data

The schema inference expects a JSON file that is an array of objects. For each root level field on the objects, Validio infer the datatype for that field. All non-scalar values fields (e.g. objects and arrays) are ignored.

You can change the inferred data type by using the drop-down shown on the right-hand side of each feature. While defining the schema you can also choose to ignore features found during the inference process. You can do so by deselecting the checkboxes shown on the left of each feature.

After you have completed the wizard, you will be able to start using the connector when setting up new pipelines or as a reference inside a monitor.

3.3 Nullable features

For each feature you have the option to check the 'nullable' checkbox. Make sure to check the checkbox for the features you want to apply a null monitor or null filter on.


Did this page help you?