Overview
Databricks is a data processing cloud-based platform. It simplifies collaboration of data analysts, data engineers, and data scientists. Databricks is available in Microsoft Azure, Amazon Web Services, and Google Cloud Platform.
Dataedo will connect to single catalog Unity Catalog via API, and document objects and data lineage within the connected catalog,
Connector features
Data Source | Support | Schema | Lineage | Profiling | Classification | Export comments | FK tester | DDL import |
---|---|---|---|---|---|---|---|---|
Databricks Unity Catalog | Native | ✅ | Object Level | ❌ | ✅ | ✅ | NA | NA |
Data Catalog
Dataedo will document following objects and their respective properties from Databricks:
Object Name | Metadata | Lineage |
---|---|---|
Delta Live Tables | ✅ | ✅ |
Pipelines | Limited | ✅ |
Tables | ✅ | ✅ |
Views | ✅ | ✅ |
Columns | ✅ | ❌ |
External Tables | ✅ | ❌ |
Objects Properties Configuration & Support
Documentation is created for selected Unity Catalog. If there is a need want to connect multiple catalogs,
Known Limitations
Documentation Functionality
- Data Profiling for Databricks, however we're working on this feature for feature releases.
- Connection to multiple catalogs or regional metastore is not yet supported [it is on the roadmap]
Lieanege Functionality
- Object level Lineage is currently discovered by Dataedo. Column level lineage [it is on the roadmap]
- Lineage between different Unity Catalogs will be not discovered
- For Objects that exists on workspace level like Pipelines, Dataedo will discover only name, not script