HomeDocumentationRecipesChangelog
HomeRequest DemoContact
Documentation
HomeRequest DemoContact

Databricks

Create a Databricks Source

Prepare credentials and permission in Databricks

Certain credentials and permission are required for Validio to validate data from Databricks.

โ—๏ธ

Credential Permission Requirements

Validio Credentials require VIEWER access rights when connecting to sources to read and access data. Admins must ensure that they do not provide EDITOR access rights to their credentials.


For detailed information about permissions in Databricks, refer to Authentication and access control in Databricks documentation.

Create a Databricks Access Token

An access token is needed to connect Validio to Databricks. We recommend that you create a service principal and generate an access token for that service principal.

  1. Create service principal in the Databricks Account Console by going to Admin Settings -> Identity and Access -> Manage Service Principals -> Add service principal. Save the service principal's application ID.
  2. Give access token permissions to the service principal, following these instructions in Databricks documentation.
  3. Generate an access token for the service principal, following these instructions in Databricks documentation.

Grant Permissions to the Validio Service Principal

Proceed with the method that corresponds to your metastore.

  • Unity Catalog:
    GRANT SELECT ON CATALOG <catalog-name> TO <user_or_validio_service_principal>;
  • Hive Metastore:
    GRANT USAGE, READ_METADATA, SELECT ON CATALOG <catalog-name> TO \<user_or_validio_service_principal>;

๐Ÿ“˜

Note

Validio Data Lineage is only supported for data from Unity Catalog.

Create a Databricks SQL Warehouse for Validio.

Create a dedicated SQL Warehouse for Validio to query data for validation. We recommend starting with a 2X-Small Warehouse and setting Auto Stop to minimum. If your validation needs exceed the capabilities of such SQL Warehouse, you can increase the size as needed.

Give the user or Validio service principal access Can Use permissions on the SQL Warehouse.

Add a Databricks Credential

To add a credential for Databricks,

  1. Navigate to Credentials and click + New Credential.
  2. Under Namespace, select a namespace where the resources will be created.
  3. For Credential Type, select Databricks Credential.
  4. Fill in the credential parameter fields. Refer to the Databricks Credential Parameters table.
  5. Check Use for catalog to automatically discover credentials and add them to the catalog page.
  6. Click Create.

Validio will validate the connection to the Databricks account. If validation passes, Validio will automatically start fetching data. If validation fails, check that you provided the correct parameter values and try again.

Once the credential is created, you can add a source to monitor Databricks data.

Databricks Credential Parameters

FieldDescriptionExample
NameIdentifier for the credentials. Used when accessing Sources.service_acount_product_staging
HostDatabricks account host.123456789.1.gcp.databricks.com
PortSQL warehouse port.443
HTTP PathSQL warehouse URL./sql/1.0/warehouses/c0aa12c3456c789
Access tokenEnter or paste the access token created in Databricks.n/a

Add a Databricks Source

To add a source for Databricks,

  1. Navigate to Sources and click + New source.
  2. Under Source type, select Databricks.
  3. Under Config,
    1. Select the valid Credential or create a new credential to authenticate your connection to the data warehouse.
    2. Enter the Database, Schema, and Table to specify where the data comes from. Selecting more than one table will create a new source for each table.
    3. Set how many days of Historic data to use when you start the source.
    4. Set the Polling schedule, which is how frequently the validators on the source will check for changes.
  4. Under Schema, click Continue to automatically infer the schema fields from the tables you selected. If you select many tables, this operation can take a few minutes to complete.
  5. Under Source details,
    1. Add Tags to help group related sources or to use for routing notifications.
    2. Add an Owner who will be the contact for incident notifications.
  6. Click Continue to create the source.
    Source names are generated automatically and will be displayed when the source creation completes. If there are more than 5 sources, you will see the names for the first five and a count of the remaining sources.