Databricks (Unity Catalog and Hive Metastore)

Michal Adamczyk - Dataedo Team Michal Adamczyk 21st August, 2024
Applies to: Dataedo 24.x (current) versions, Article available also for: 23.x

Databricks is a cloud-based, fast, and collaborative Apache Spark-based analytics platform that provides a unified analytics environment for data engineering, data science, and data analytics workloads.

By leveraging Dataedo's data catalog and documentation capabilities, Databricks users can gain a deeper understanding of their data assets, including data lineage, ultimately leading to better data-driven decision making.

Dataedo offers two connectors for Databricks:

  • Databricks Unity Catalog - Newer and recommended. Continuously enhanced with new features. Uses Databricks API.
  • Databricks Hive Metastore - Older and currently not being expanded. Uses native connector to an Apache Hive Metastore to document metadata in Databricks.