Databricks
Create a Databricks Source
Prepare credentials and permission in Databricks
Certain credentials and permission are required for Validio to validate data from Databricks.
For detailed information about permissions in Databricks, refer to Authentication and access control in Databricks documentation.
Create a Databricks access token
Create a Databricks access token to allow Validio access to Databricks.
- You can either create an access token for a service principal. This is the recommended approach.
- Or you can create a personal access token for a user.
Grant permissions on data
Proceed with the method that corresponds to your metastore.
- Unity Catalog:
GRANT SELECT ON CATALOG some_catalog TO user_or_validio_service_principal;
- Hive metastore:
GRANT USAGE, READ_METADATA, SELECT ON CATALOG some_catalog TO user_or_validio_service_principal;
Note
Validio Data Lineage is only supported for data from Unity Catalog.
Create a Databricks SQL Warehouse for Validio.
Create a dedicated SQL Warehouse for Validio to query data for validation. We recommend starting with a 2X-Small Warehouse and setting Auto Stop to minimum. If your validation needs exceed the capabilities of such SQL Warehouse, you can increase the size as needed.
Give the user or Validio service principal access Can Use
permissions on the SQL Warehouse.
Create the Source in Validio
Credential parameters
Field | Required | Description | Example |
---|---|---|---|
Name | ✅ | Identifier for the credentials. Used when accessing Sources. | service_acount_product_staging |
Access token | ✅ | Enter, or paste, the access token. |
Configuration parameters
Field | Required | Description | Example |
---|---|---|---|
Name | ✅ | Identifier for the Source. Used when setting up validators. | East_coast_weather_forecast |
Host name | ✅ | Databricks account host. | 123456789.1.gcp.databricks.com |
Port | ✅ | SQL Warehouse port | 443 |
HTTP Path | ✅ | SQL Warehouse HTTP Path | /sql/1.0/warehouses/asd12a3asd123 |
Cron preset | Determines how often to query the bucket for new data based on a preset option. Select custom to use your own cron expression. | ||
Cron expression | ✅ | Determines how often to query the table for new data based on cron expression. Expression of cron presets are displayed here. Used to enter your own cron expression. |
Updated 6 months ago