HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Configuring a Source

Configure a Source connector to integrate your data source with Validio. Connecting a source enables Validio to read its data for validation and monitoring.

To configure a Source,

  1. Navigate to Sources and click + New source.

    ๐Ÿ“˜

    Convert Assets to Sources

    You can also create a source by converting assets in the Catalog and Lineage pages.

  2. Under Source type, select the type of source you want to connect.
  3. Under Config,
    1. Select the valid Credential or create a new credential to authenticate your connection to the data source.
    2. Specify which dataset you want Validio to read, or where on the source the data comes from. For more information, refer to Configure the Dataset.
    3. Set how many days of Historic data to use when you start the source.
    4. Set the Polling schedule, which is how frequently the validators on the source will check for changes.
  4. Under Schema, select the fields to include in the schema. For more information, refer to Configure the Schema.
  5. Under Source details,
    1. Add Tags to help group related sources or to use for routing notifications.
    2. Add an Owner who will be the contact for incident notifications.
  6. Click Continue to create the source.
    Source names are generated automatically and will be displayed when the source creation completes. If there are more than 5 sources, you will see the names for the first five and a count of the remaining sources.

Configure the Dataset

Specify which data asset you want Validio to read. You can either configure the assets manually, or choose from listed suggestions if available. Configuration parameters look different depending on the type of Source you configure.

  • Available datasets--If Validio have permission to read associated datasets within a project, you can select these datasets from your Source.
  • Recent browser selections--If Validio does not have permissions, your browser shows recent selections for each field.

Polling Schedule

For Data warehouses and Object storage, you must set the polling interval parameter to specify how often Validio reads the data.

You can set the polling interval parameter with one of the presets or type it into the cron expression field. For cron schedule expressions, refer to a cron editor, such as https://crontab.guru/.

For Data streams, you do not configure polling for data, since data is read as soon as it is available from the stream processor.

โ—๏ธ

Configure Polling with the UI or the CLI

You can configure polling to run on a schedule using Cron presets or expressions, or you can configure polling manually using the web interface or using the CLI. However, you cannot do both on one source.


Configure the Schema

Configure the schema for the data source you want to validate. Depending on data source and data types, you can select which fields, including nested fields, you want to validate and set nullability for the data in the source.

Nested Fields

Validio supports semi-structured and complex data types, including arrays and nested fields. You can select all or specify which nested fields that you want to include for further validation.

For more information on semi structured and complex data types, refer to Semi-structured Data.

Nullable Fields and Data Types

For inferred schemas, you can change nullability and data types. This is useful when the inferred schema does not match the expectations on incoming data.

๐Ÿšง

Null Fields and Metrics

If NULL exists in a field where the nullable checkbox is not selected, this particular datapoint is not included in the Validator metrics.

For example, in a row count Validator, the datapoint is ignored. You must select the nullable checkbox to validate null values, such as share of null.