Documenting dbt Core

Applies to: Dataedo 23.x (current) versions, Article available also for: 10.x

dbt Core™ is an open source command line tool that enables data teams to transform data in their warehouses by simply writing select statements - it enables that analysts to work more like software engineers. Dataedo supports documenting dbt projects which you can access locally. In future releases support for connecting to dbt Cloud™ will be added.

Supported metadata and data lineage

Imported metadata

Imported Editable
Tables, Views, Materialized Views,
Incremental, Ephemeral
  Table comments
  Columns
   Data types
   Nullability (from not_null test)
   Foreign keys (from relationships test)
   Unique keys (from unique test)
   Column comments

All objects from dbt will be displayed as views in Dataedo, the subtype will determine the actual materialization.

Data Lineage

Source Method Status
dbt object - dbt object From Manifest.json
database object - dbt object From Manifest.json

Dataedo creates an automatic data lineage for you between dbt objects (always). Additionally, it creates a lineage between a dbt object and its related database object (only if you connect the data warehouse used in this dbt project).

To have the automatic data lineage generated, use the notation from Jinja when you refer to other objects i.e. {{ source(source_name, table_name) }} or {{ ref(package_name, model_name) }}.
So only when you see a lineage between objects in the documentation generated by dbt, it will be imported into Dataedo.

Prerequisites

You need to have access to the folder on your computer where the dbt project is located. The folder must contain the dbt_project.yml file in which the correct target-path name is specified (target by default). The folder to which the target-path leads must contain a catalog.json and manifest.json file (as a last resort, manifest.json alone will suffice).

Creating a catalog.json file

The catalog.json file is created after executing the dbt docs generate command. Open the terminal, navigate to the folder where the dbt project is located and execute the dbt docs generate command. If everything is properly configured the catalog.json file should generate. If you have problems, refer to the dbt documentation.

Connect Dataedo to dbt project

To connect to dbt project create new documentation by clicking Add documentation and choosing Database connection. Image title
On the Add documentation window choose dbt (beta) and click Next >: Image title
On the next screen, click on the three dots next to the path field, then find the dbt project folder (it must contain the dbt_project.yml file), click on it and confirm with Ok. Image title
For best results, add a data warehouse already documented in Dataedo in which dbt was running. To do this, click on the dropdown next to the Database (optional) field and select the database in which dbt was running. Image title
Click on the Connect button at the bottom right of the window.

In the next window you will be asked what objects you want to import into Dataedo. If you will want to omit some objects in bulk refer to the advanced import filter. After selecting the objects to be imported, click Next >. Image title
Add a document title and click Import. Image title

Outcome

dbt project has been imported to a new documentation. Automatic data lineage was created. Image title.

Found issue with this article? Comment below
Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
0
There are no comments. Click here to write the first comment.