Cloudera

Wojtek Bialek - Dataedo Team Wojtek Bialek 4th December, 2024
Applies to: Dataedo 24.x (current) versions, Article available also for: 10.x, 23.x

Starting with Dataedo 24.4, we support a Cloudera Data Catalog connector, enabling users to import single databases cataloged within Cloudera, making it easier to maintain organized and accessible data assets.

Documenting Cloudera Data Catalog

Dataedo import following Cloudera Data Catalog elements:

Cloudera Data Catalog Dataedo
Data asset Object
Technical Metadata Fields + Custom Fields
Classification Fields + Custom Fields
Business Metadata Fields + Custom Fields

Data Lineage

  • Cloudera connector builds lineage within single import (it won't create lineage to other technologies\Cloudera imports).
  • Cloudera connector builds Object-level lineage

Connecting to Cloudera Data Catalog

Add new connection

To connect to Cloudera Data Catalog instance create new documentation by clicking Add and choosing New connection.

On the connection screen choose Cloudera Data Catalog (it can be found under Catalogs Folder).

Provide database connection details:

  • Host - Apache Atlas endpoint. It can be found in Environment tabs (Environment → Data Lake → Endpoints → Atlas Endpoint)
  • User and password- workload user login and password
  • SSL mode - choose the SSL mode based on your security needs

Setting Service Type and Objects

In order to correctly import Cloudera Data Catalog Dataedo needs to know three things:

  • Service Type - the technology you would like to import e.g. Hive, HBase
  • Main Object Type - which object from picked Service Type is the main one e.g. Hive Database (Due to graph structure of Cloudera Data Catalog, Dataedo needs information which object should be treated as Database/Storage)
  • Specific Object - object of picked object type the user would like to import e.g. Database AdventureWorks

Mapping Cloudera Data Catalog Service to Dataedo

This is crucial step in importing Cloudera Data Catalog in Dataedo. It allows users to do the following:

  • Pick Dataedo object type for every Cloudera entity type imported
  • Select whether to import certain Cloudera data assets
  • Map attributes

Object mapping

Mapping attributes

Mapping attributes is a separate control which can be access by clicking Map attributes. In form you can browse through all Cloudera Attributes and set Dataedo attribute/field it should be loaded to. You can also create new Custom Fields and use them for mapping.

Mapping Attributes

After setting up object and attributes mapping you can import your Cloudera. Once you map objects for certain service it can be reused in Import Changes as well as in Copy Connection.

Specifications

Imported objects

Imported Editable
Data assets
Technical properties
User defined properties
Label
Classification
Business Metadata

Supported features

Feature Is supported
Writing changes back
Data Profiling
CMD Import
PK/FK relationship tester
Linked Sources

Data lineage

Source Method Version
Internal lineage (object-level) Rest API 24.4 (2024)