Starting with Dataedo 24.4, we support a Cloudera Data Catalog connector, enabling users to import single databases cataloged within Cloudera, making it easier to maintain organized and accessible data assets.
Documenting Cloudera Data Catalog
Dataedo import following Cloudera Data Catalog elements:
Cloudera Data Catalog | Dataedo | |
---|---|---|
Data asset | → | Object |
Technical Metadata | → | Fields + Custom Fields |
Classification | → | Fields + Custom Fields |
Business Metadata | → | Fields + Custom Fields |
Data Lineage
- Cloudera connector builds lineage within single import (it won't create lineage to other technologies\Cloudera imports).
- Cloudera connector builds Object-level lineage
Connecting to Cloudera Data Catalog
Add new connection
To connect to Cloudera Data Catalog instance create new documentation by clicking Add and choosing New connection.
On the connection screen choose Cloudera Data Catalog (it can be found under Catalogs Folder).
Provide database connection details:
- Host - Apache Atlas endpoint. It can be found in Environment tabs (Environment → Data Lake → Endpoints → Atlas Endpoint)
- User and password- workload user login and password
- SSL mode - choose the SSL mode based on your security needs
Setting Service Type and Objects
In order to correctly import Cloudera Data Catalog Dataedo needs to know three things:
- Service Type - the technology you would like to import e.g. Hive, HBase
- Main Object Type - which object from picked Service Type is the main one e.g. Hive Database (Due to graph structure of Cloudera Data Catalog, Dataedo needs information which object should be treated as Database/Storage)
- Specific Object - object of picked object type the user would like to import e.g. Database AdventureWorks
Mapping Cloudera Data Catalog Service to Dataedo
This is crucial step in importing Cloudera Data Catalog in Dataedo. It allows users to do the following:
- Pick Dataedo object type for every Cloudera entity type imported
- Select whether to import certain Cloudera data assets
- Map attributes
Mapping attributes
Mapping attributes is a separate control which can be access by clicking Map attributes. In form you can browse through all Cloudera Attributes and set Dataedo attribute/field it should be loaded to. You can also create new Custom Fields and use them for mapping.
After setting up object and attributes mapping you can import your Cloudera. Once you map objects for certain service it can be reused in Import Changes as well as in Copy Connection.
Specifications
Imported objects
Imported | Editable | |
---|---|---|
Data assets | ✅ | ✅ |
Technical properties | ✅ | |
User defined properties | ||
Label | ||
Classification | ✅ | |
Business Metadata | ✅ |
Supported features
Feature | Is supported |
---|---|
Writing changes back | |
Data Profiling | |
CMD Import | ✅ |
PK/FK relationship tester | |
Linked Sources |
Data lineage
Source | Method | Version |
---|---|---|
Internal lineage (object-level) | Rest API | 24.4 (2024) |