Configuring a Source
Configure a Source connector to integrate your data source with Validio. Validio can then read the data for validation and monitoring purposes.
Select a Source Type
From the Sources page, click + New source and select the Source type you want to connect.
Note
You can also create a Source from assets in the Catalog and Lineage pages.
Select a Valid Credential
You can either select configured Credentials with access to the particular Source, or create a new one. Credential parameters look different depending on the Source. For more information, refer to the respective Source type page.
Configure the Dataset
Specify which data asset you want Validio to read. You can either configure the assets manually, or choose from listed suggestions if available. Configuration parameters look different depending on the type of Source you configure.
- Available datasets--If Validio have permission to read associated datasets within a project, you can select these datasets from your Source.
- Recent browser selections--If Validio does not have permissions, your browser shows recent selections for each field.
Polling interval
For Data warehouses and Object storage, you must set the polling interval parameter to specify how often Validio reads the data.
You can set the polling interval parameter with one of the presets or type it into the cron expression field. For cron schedule expressions, refer to a cron editor, such as https://crontab.guru/.
For Data streams, you do not configure polling for data, since data is read as soon as it is available from the stream processor.
Important
You can configure polling to run on a schedule using Cron presets or expressions, or you can configure polling manually using the web interface or using the CLI. However, you cannot do both on one source.
Configure the Schema
Configure the schema for the data source you want to validate. Depending on data source and data types, you can:
- Select which fields you want to validate on
- Select any nested fields to include
- Set nullability and data types for the data in the Source
Nested fields
Validio supports semi-structured and complex data types, including arrays and nested fields. You can select all or specify which nested fields that you want to include for further validation.
For more information on semi structured and complex data types, refer to Semi-structured data.
Nullable fields and data types
For inferred schemas, you can change nullability and data types. This is useful when the inferred schema does not match the expectations on incoming data.
Caution
If
NULL
exists in a field where thenullable
checkbox is not selected, this particular datapoint is not included in the Validator metrics.For example, in a row count Validator, the datapoint is ignored. You must select the
nullable
checkbox to validate null values, such as share of null.
Cursor field
When you configure Data Warehouse Source types, you can select a cursor field
and a lookback time
for Validio to read the data.
For information on cursor field
considerations, refer to Data Warehouses.
Window
Select a Window type and configure your Window.
Updated 5 months ago