Configuring a Source
Configure a Source connector to enable Validio to read its data for validation and monitoring. This page discusses the general source configuration steps. You can find instructions for specific source integrations on their dedicated pages.
Converting Assets to SourcesIf you already have credentials configured, you can also create a source by converting discovered assets in the Catalog and Lineage pages. For more information, see Using Catalog and Using Lineage.
To configure a Source,
- Navigate to Sources and click + New source.
- Under Source type, select the type of source you want to connect.
- Under Config,
- Select the valid Credential or create a new credential to authenticate your connection to the data source.
- Specify which dataset you want Validio to read, or where on the source the data comes from. You can select the table or use a SQL query to specify the source table. For more information, refer to Configure the Dataset.
- Set the Polling schedule, using cron expression to specify how frequently the validators on the source will check for changes.
- Under Schema, select the fields to include in the schema. For more information, refer to Configure the Schema.
- Under Source details,
- Add Tags to help group related sources or to use for routing notifications.
- Add an Owner who will be the contact for incident notifications.
- Click Continue to create the source.
Source names are generated automatically and will be displayed when the source creation completes. If there are more than 5 sources, you will see the names for the first five and a count of the remaining sources.
Configure the Dataset
Specify which data asset you want Validio to read. You can either configure the assets manually, or choose from listed suggestions if available. Configuration parameters look different depending on the type of Source you configure.
- Available datasets--If Validio has permission to read associated datasets within a project, you can select these datasets from your Source.
- Custom SQL query--You can write a valid SQL query to specify the source table. The source table template function is required in your query.
Polling Schedule
For Data warehouses and Object storage, you must set the polling interval parameter to specify how often Validio reads the data.
You can set the polling interval parameter with one of the presets or type it into the cron expression field. For cron schedule expressions, refer to a cron editor, such as https://crontab.guru/.
For Data streams, you do not configure polling for data, since data is read as soon as it is available from the stream processor.
Configure Polling with the UI or the CLIYou can configure polling to run on a schedule using Cron presets or expressions, or you can configure polling manually using the web interface or using the CLI. However, you cannot do both on one source.
Configure the Schema
Configure the schema for the data source you want to validate. Depending on data source and data types, you can select which fields, including nested fields, you want to validate and set nullability for the data in the source. Also depending on your Source type, it might take a few seconds to infer the schema.
If data exists in the source, Validio can infer the schema:
- Schema from metadata: Validio reads the schema from the metadata in the data source, for example, from
INFORMATION_SCHEMA
in a Data Warehouse. This is true for most structured data types. - Schema from inference: Validio infers the schema from the existing data when no pre-defined schema exists. This is true for most semi-structured data types, for example
JSON
.
Nested Fields
Validio supports semi-structured and complex data types, including arrays and nested fields. You can select all or specify which nested fields that you want to include for further validation.
JSONPath ExpressionsValidio uses JSONPath to represent data structures. For example, the JSONPath expression
some_array.length()
represents the size of an array.
Nullable Fields and Data Types
For inferred schemas, you can change nullability and data types. This is useful when the inferred schema does not match the expectations on incoming data.
Null Fields and MetricsIf
NULL
exists in a field where thenullable
checkbox is not selected, this particular datapoint is not included in the Validator metrics.For example, in a row count Validator, the datapoint is ignored. You must select the
nullable
checkbox to validate null values, such as share of null.
Updated 20 days ago