Connecting to Databricks Unity Catalog

17th April, 2024
Applies to: Dataedo 23.x versions, Article available also for: 24.x (current)
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

How to connect

Pre-requisites:

To scan Azure Databricks Unity Catalog, Dataedo connects to a Databricks workspace API, and uses the Personal Access Token for authentication. You need to have a Databricks workspace that is Unity Catalog enabled and attached to the metastore you want to scan. Dataedo will need following information to connect to Databricks instance:

NOTE: You can find them in your Databricks workspace

  1. Workspace url - once you've opened your Databricks workspace, copy url from the address bar in your web browser and paste it into Dataedo

  2. Token - a personal token that has to be generated in profile settings in your Databricks workspace. Check how to generate PAT

  3. Catalog - you can either type in the Catalog name manually or later choose it from the list

Create Connection

  1. From the list of connections, select "Databricks"
  2. Enter connection details
  3. If you don't remember the Catalog name, you can select it from the list of available catalogs using [...] button in Dataedo

Image title

Importing metadata

  1. When connection was successful, Dataedo will read objects and then show a list of objects found. You can choose which objects to import.
  2. You can also use advanced filter to narrow down list of objects.

Outcome

  1. Metadata for your Unity Catalog has been imported to new documentation in the repository.

Hive documentation

  1. Lineage for your objects within imported Unity Catalog objects was documented more details check here

Watchouts

ℹ️ Important
  • For all the objects that you want to bring into Dataedo, the user needs to have at least SELECT privilege on tables/views, USE CATALOG on the object’s catalog, and USE SCHEMA on the object’s schema.
  • In order to scan all the objects in a Unity Catalog metastore, use a user with metastore admin role. Learn more from Manage privileges in Unity Catalog and Unity Catalog privileges and securable objects.
  • For classification, user also needs to have SELECT privilege on the tables/views to retrieve sample data.
  • If your Azure Databricks workspace doesn’t allow access from public network, or if your Microsoft Purview account doesn’t enable access from all networks, you can use the Managed Virtual Network Integration Runtime for scan. You can set up a managed private endpoint for Azure Databricks as needed to establish private connectivity.

How to generate the Personal Access Token?

Depends on your implementation scenario, you would need to to use:
After openning page, if you are using other cloud provider than AWS you can switch to respective manual.