Cloudera

Starting with Dataedo 24.4, we support a Cloudera Data Catalog connector, enabling users to import single databases cataloged within Cloudera, making it easier to maintain organized and accessible data assets.

Documenting Cloudera Data Catalog

Dataedo imports the following Cloudera Data Catalog elements:

Cloudera Data Catalog		Dataedo
Data asset	→	Object
Technical Metadata	→	Fields + Custom Fields
Classification	→	Fields + Custom Fields
Business Metadata	→	Fields + Custom Fields

Data Lineage

Cloudera connector builds lineage within a single import (it won't create lineage to other technologies/Cloudera imports).
Cloudera connector builds Object-level lineage.

Connecting to Cloudera Data Catalog

Add new connection

To connect to a Cloudera Data Catalog instance, create new documentation by clicking Add and choosing New connection.

On the connection screen, choose Cloudera Data Catalog (it can be found under the Catalogs folder).

Provide database connection details:

Host - Apache Atlas endpoint. It can be found in Environment tabs (Environment → Data Lake → Endpoints → Atlas Endpoint).
User and password - workload user login and password.
SSL mode - choose the SSL mode based on your security needs.

Setting Service Type and Objects

In order to correctly import Cloudera Data Catalog, Dataedo needs to know three things:

Service Type - the technology you would like to import, e.g., Hive, HBase.
Main Object Type - which object from the picked Service Type is the main one, e.g., Hive Database (Due to the graph structure of Cloudera Data Catalog, Dataedo needs information on which object should be treated as Database/Storage).
Specific Object - object of the picked object type the user would like to import, e.g., Database AdventureWorks.

Mapping Cloudera Data Catalog Service to Dataedo

This is a crucial step in importing Cloudera Data Catalog into Dataedo. It allows users to do the following:

Pick Dataedo object type for every Cloudera entity type imported.
Select whether to import certain Cloudera data assets.
Map attributes.

Mapping attributes

Mapping attributes is a separate control which can be accessed by clicking Map attributes. In the form, you can browse through all Cloudera Attributes and set the Dataedo attribute/field it should be loaded to. You can also create new Custom Fields and use them for mapping.

After setting up object and attributes mapping, you can import your Cloudera. Once you map objects for a certain service, it can be reused in Import Changes as well as in Copy Connection.

Specifications

Imported objects

	Imported	Editable
Data assets	✅	✅
Technical properties	✅
User defined properties
Label
Classification	✅
Business Metadata	✅

Supported features

Feature	Is supported
Writing changes back
Data Profiling
CMD Import	✅
PK/FK relationship tester
Linked Sources

Data lineage

Source	Method	Version
Internal lineage (object-level)	Rest API	24.4 (2024)

Documenting Cloudera Data Catalog​

Data Lineage​

Connecting to Cloudera Data Catalog​

Add new connection​

Setting Service Type and Objects​

Mapping Cloudera Data Catalog Service to Dataedo​

Mapping attributes​

Specifications​

Imported objects​

Supported features​

Data lineage​