Databricks

Create a Databricks Source

Prepare credentials and permission in Databricks

Certain credentials and permission are required for Validio to validate your data.

πŸ“˜

For detailed information about permissions in Databricks, refer to Authentication and access control.

1. Create a Databricks access token

Create a Databricks access token to allow Validio access to Databricks.

2. Grant permissions on data

Proceed with the method that corresponds to your metastore:

  • Unity Catalog:
    GRANT SELECT ON CATALOG some_catalog TO user_or_validio_service_principal;
  • Hive metastore:
    GRANT USAGE, READ_METADATA, SELECT ON CATALOG some_catalog TO user_or_validio_service_principal;

3. Create a Databricks SQL Warehouse for Validio.

Create a dedicated SQL Warehouse for Validio to query data for validation. We recommend starting with a 2X-Small Warehouse and setting Auto Stop to minimum. If your validation needs exceed the capabilities of such SQL Warehouse, you can increase the size as needed.

Give the user or Validio service principal access Can Use permissions on the SQL Warehouse.

Create the Source in Validio

Credential parameters

FieldRequiredDescriptionExample
Nameβœ…Identifier for the credentials. Used when accessing Sources.service_acount_product_staging
Access tokenβœ…Enter, or paste, the access token.

Configuration parameters

FieldRequiredDescriptionExample
Nameβœ…Identifier for the Source. Used when setting up validators.East_coast_weather_forecast
Host nameβœ…Databricks account host.123456789.1.gcp.databricks.com
Portβœ…SQL Warehouse port443
HTTP Pathβœ…SQL Warehouse HTTP Path/sql/1.0/warehouses/asd12a3asd123
Cron presetDetermines how often to query the bucket for new data based on a preset option.

Select custom to use your own cron expression.
Cron expressionβœ…Determines how often to query the table for new data based on cron expression.

Expression of cron presets are displayed here.

Used to enter your own cron expression.