Apache Atlas support

Wojtek Bialek - Dataedo Team Wojtek Bialek 3rd June, 2024

Apache Atlas is an open-source data catalog. It allows users to organize and manage information about different types of data, like databases and tables, making it easier to understand and control data assets. From Dataedo 24.2 we support Apache Atlas connector that import single database stored within Atlas.

Documenting Apache Atlas

Dataedo imports following Atlas elements:

Apache Atlas Dataedo
Entity Object
Technical Metadata Fields + Custom Fields
Classification Fields + Custom Fields
Business Metadata Fields + Custom Fields

Data Lineage

  • Apache Atlas builds lineage within single import (it won't create lineage to other technologies\\Atlas imports).
  • Apache Atlas builds Object-level lineage

Connecting to Atlas

Add new connection

To connect to Apache Atlas instance create new documentation by clicking Add and choosing New connection.

On the connection screen choose Apache Atlas (it can be found under Catalogs Folder).

Provide database connection details:

  • Host - provide a host name or address where a database is on. E.g. server17server17.ourdomain.com or 192.168.0.37
  • Port - specify the port number where Apache Atlas is running
  • User and password- provide user and password used to authenticate to Apache Atlas
  • SSL mode - choose the SSL mode based on your security needs

Saving password

You can save password for later connections by checking Save password option. Password are saved in the repository database.

Setting Service Type and Objects

In order to correctly import Apache Atlas Dataedo needs to know three things:

  • Service Type - the technology you would like to import e.g. Hive, HBase
  • Main Object Type - which object from picked Service is the main one e.g. Hive Database (Due to flat structure of Apache Atlas, Dataedo needs information which object should be treated as Database/Source of every objects in Service)
  • Specific Object - object of picked object type the user would like to import e.g. Database AdventureWorks

After setting this up, Atlas import is ready to begin

Mapping Atlas Objects to Dataedo

This is crucial step in importing Apache Atlas in Dataedo. It allows users to do the following:

  • Pick Dataedo object type for every Atlas entity type imported
  • Select whether to import certain Atlas entities
  • Map attributes

Image title

Apart from editing mapping you can have a look on how many percent of Atlas attributes were mapped to Dataedo. After mapping attributes for one object type you can Right Click to Copy and than Paste your mapping to different object types.

Mapping attributes

Mapping attributes is a separate control which can be access by clicking Map attributes. In form you can browse through all Atlas Attributes and set Dataedo attribute/filed it should be loaded to. You can also create new Custom Fields and use them for mapping.

Mapping Attributes

After setting up object and attributes mapping you can import your Atlas. Once you map objects for certain service it can be reused in Import Changes as well as in Copy Connection.

Specifications

Imported objects

Imported Editable
Entities
Technical properties
User defined properties
Label
Classification
Business Metadata

Supported features

Feature Is supported
Writing changes back
Data Profiling
CMD Import
PK/FK relationship tester
Linked Sources

Data lineage

Source Method Version
Internal lineage (object-level) Rest API 24.2 (2024)

Plans for future releases

  • import terms and glossaries
  • import object labels
  • import user defined properties
  • import multiple services at once
  • import propagated classifications
  • lineage improvements, lineage cross multiple Apache Atlas documentations