Skip to main content

Cloudera

Starting with Dataedo 24.4, we support a Cloudera Data Catalog connector, enabling users to import single databases cataloged within Cloudera, making it easier to maintain organized and accessible data assets.

Documenting Cloudera Data Catalog

Dataedo imports the following Cloudera Data Catalog elements:

Cloudera Data CatalogDataedo
Data assetObject
Technical MetadataFields + Custom Fields
ClassificationFields + Custom Fields
Business MetadataFields + Custom Fields

Data Lineage

  • Cloudera connector builds lineage within a single import (it won't create lineage to other technologies/Cloudera imports).
  • Cloudera connector builds Object-level lineage.

Connecting to Cloudera Data Catalog

Add new connection

To connect to a Cloudera Data Catalog instance, create new documentation by clicking Add and choosing New connection.

On the connection screen, choose Cloudera Data Catalog (it can be found under the Catalogs folder).

Provide database connection details:

  • Host - Apache Atlas endpoint. It can be found in Environment tabs (Environment → Data Lake → Endpoints → Atlas Endpoint).
  • User and password - workload user login and password.
  • SSL mode - choose the SSL mode based on your security needs.

Setting Service Type and Objects

In order to correctly import Cloudera Data Catalog, Dataedo needs to know three things:

  • Service Type - the technology you would like to import, e.g., Hive, HBase.
  • Main Object Type - which object from the picked Service Type is the main one, e.g., Hive Database (Due to the graph structure of Cloudera Data Catalog, Dataedo needs information on which object should be treated as Database/Storage).
  • Specific Object - object of the picked object type the user would like to import, e.g., Database AdventureWorks.

Mapping Cloudera Data Catalog Service to Dataedo

This is a crucial step in importing Cloudera Data Catalog into Dataedo. It allows users to do the following:

  • Pick Dataedo object type for every Cloudera entity type imported.
  • Select whether to import certain Cloudera data assets.
  • Map attributes.
Object mapping

Mapping attributes

Mapping attributes is a separate control which can be accessed by clicking Map attributes. In the form, you can browse through all Cloudera Attributes and set the Dataedo attribute/field it should be loaded to. You can also create new Custom Fields and use them for mapping.

Mapping Attributes

After setting up object and attributes mapping, you can import your Cloudera. Once you map objects for a certain service, it can be reused in Import Changes as well as in Copy Connection.

Specifications

Imported objects

ImportedEditable
Data assets
Technical properties
User defined properties
Label
Classification
Business Metadata

Supported features

FeatureIs supported
Writing changes back
Data Profiling
CMD Import
PK/FK relationship tester
Linked Sources

Data lineage

SourceMethodVersion
Internal lineage (object-level)Rest API24.4 (2024)