Databricks

Create a Databricks Source

Prepare credentials and permission in Databricks

Certain credentials and permission are required for Validio to validate your data.

📘

For detailed information about permissions in Databricks, refer to Authentication and access control.

1. Create a Databricks access token

Create a Databricks access token to allow Validio access to Databricks.

2. Grant permissions on data

Proceed with the method that corresponds to your metastore:

  • Unity Catalog:
    GRANT SELECT ON CATALOG some_catalog TO user_or_validio_service_principal;
  • Hive metastore:
    GRANT USAGE, READ_METADATA, SELECT ON CATALOG some_catalog TO user_or_validio_service_principal;

3. Create a Databricks SQL Warehouse for Validio.

Create a dedicated SQL Warehouse for Validio to query data for validation. We recommend starting with a 2X-Small Warehouse and setting Auto Stop to minimum. If your validation needs exceed the capabilities of such SQL Warehouse, you can increase the size as needed.

Give the user or Validio service principal access Can Use permissions on the SQL Warehouse.

Create the Source in Validio

Credential parameters

FieldRequiredDescriptionExample
NameIdentifier for the credentials. Used when accessing Sources.service_acount_product_staging
Access tokenEnter, or paste, the access token.

Configuration parameters

FieldRequiredDescriptionExample
NameIdentifier for the Source. Used when setting up validators.East_coast_weather_forecast
Host nameDatabricks account host.123456789.1.gcp.databricks.com
PortSQL Warehouse port443
HTTP PathSQL Warehouse HTTP Path/sql/1.0/warehouses/asd12a3asd123
Cron presetDetermines how often to query the bucket for new data based on a preset option.

Select custom to use your own cron expression.
Cron expressionDetermines how often to query the table for new data based on cron expression.

Expression of cron presets are displayed here.

Used to enter your own cron expression.