Configuring a Source

Configure a Source connector to integrate your data source with Validio. Validio can then read the data for validation and monitoring purposes.

1. Select type

From the Sources page, click + New source and select the Source type you want to connect.

Source configuration wizard - Select Source type.

Source configuration wizard - Select Source type.

πŸ“˜

Create Source from Lineage

It is also possible to create a Source from the Lineage page.


2. Credentials

You can either select configured Credentials with access to the particular Source, or create a new one.

πŸ“˜

Credential parameters look different depending on the Source

For more information, refer to the respective Source type page.

Source configuration wizard - Select or create Credential.

Source configuration wizard - Select or create Credential.


3. Config

Specify which data asset you want Validio to read. You can either configure the assets manually, or choose from listed suggestions if available:

  • Available datasets: If Validio have permission to read associated datasets within a project, you can select these datasets from your Source.
  • Recent browser selections: If Validio does not have permissions, your browser shows recent selections for each field.

πŸ“˜

Configuration parameters look different depending on the Source

For more information, refer to the respective Source type page.

Source configuration wizard - Source specific config parameters

Source configuration wizard - Source specific config parameters

3.1 Polling interval

For Data warehouses and Object storage, you must set the polling interval parameter to specify how often Validio reads the data.

πŸ“˜

Cron presets or custom 5-digit expression

You can set this parameter either with any of the presets or type it in the cron expression field.

For cron schedule expressions, refer to a cron editor, such as https://crontab.guru/.

For Data streams, the polling interval parameter is not available, since data is read as soon as it is available from the stream processor.

πŸ“˜

Manual polling

In addition to the cron schedule, you can manually poll a Source, either using the web interface, or using the CLI.


4. Schema

Configure the schema for the data source you want to validate. Depending on data source and data types, you can:

Source configuration wizard - Define schema.

Source configuration wizard - Define schema.

4.1 Nested fields

Validio supports semi-structured and complex data types, including arrays and nested fields. You can select all or specify which nested fields that you want to include for further validation.

For more information on semi structured and complex data types, refer to Semi-structured data.

Source configuration wizard - View and select nested fields.

Source configuration wizard - View and select nested fields.

4.2 Nullable fields and data types

For inferred schemas, you can change nullability and data types. This is useful when the inferred schema does not match the expectations on incoming data.

🚧

If NULL exists in a field where the nullable checkbox is not selected, this particular datapoint is not included in the Validator metrics.

For example, in a row count Validator, the datapoint is ignored. You must select the nullable checkbox to validate null values, such as share of null.

Source configuration wizard - Optionally, select `nullable fields`.

Source configuration wizard - Optionally, select nullable fields.

4.3 Cursor field

When you configure Data Warehouse Source types, you can select a cursor field and a lookback time for Validio to read the data.

🚧

Lookback time

The lookback time specifies how far back in time Validio reads data from your source. If you choose a lookback time farther back in the past, it can lead to longer query time and increased costs when data is backfilled.

For information on cursor field considerations, refer to Data Warehouses.

Source configuration wizard - Select `Cursor field` and `Lookback time` for Data Warehouse Source types.

Source configuration wizard - Select Cursor field and Lookback time for Data Warehouse Source types.


5. Window

Select a Window type and configure your Window.

Source configuration wizard - Configure a Window.

Source configuration wizard - Configure a Window.