Documenting dbt Core

Hubert Książek - Dataedo Team Hubert Książek 19th July, 2022
Applies to: Dataedo 10.x versions, Article available also for: 24.x (current), 23.x
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

dbt Core™ is an open source command line tool that enables data teams to transform data in their warehouses by simply writing select statements - it enables that analysts to work more like software engineers. Dataedo supports documenting dbt projects which you can access locally. In future releases support for connecting to dbt Cloud™ will be added.

Supported elements and metadata

All objects from dbt will be displayed as views in Dataedo, the type will determine the actual materialization. Dataedo creates an automatic data lineage for you between dbt objects (always) and sources (only if you connect data warehouse used in this dbt project). We also supports not_null test - if you test a column in dbt to see if it is not_null, Nullable marker in Dataedo will be unchecked and unique test - the corresponding constraint will be added.

To have the automatic data lineage generated, use the notation from Jinja when you refer to other objects i.e. {{ source(source_name, table_name) }} or {{ ref(package_name, model_name) }}.
So only when you see a lineage between objects in the documentation generated by dbt, it will be imported into Dataedo.

Prerequisites

You need to have access to the folder on your computer where the dbt project is located. The folder must contain the dbt_project.yml file in which the correct target-path name is specified (target by default). The folder to which the target-path leads must contain a catalog.json and manifest.json file (as a last resort, manifest.json alone will suffice).

Creating a catalog.json file

The catalog.json file is created after executing the dbt docs generate command. Open the terminal, navigate to the folder where the dbt project is located and execute the dbt docs generate command. If everything is properly configured the catalog.json file should generate. If you have problems, refer to the dbt documentation.

Connect Dataedo to dbt project

To connect to dbt project create new documentation by clicking Add documentation and choosing Database connection. Image title
On the Add documentation window choose dbt (beta) and click Next >: Image title
On the next screen, click on the three dots next to the path field, then find the dbt project folder (it must contain the dbt_project.yml file), click on it and confirm with Ok. Image title
For best results, add a data warehouse already documented in Dataedo in which dbt was running. To do this, click on the dropdown next to the Database (optional) field and select the database in which dbt was running. Image title
Click on the Connect button at the bottom right of the window.

In the next window you will be asked what objects you want to import into Dataedo. If you will want to omit some objects in bulk refer to the advanced import filter. After selecting the objects to be imported, click Next >. Image title
Add a document title and click Import. Image title

Outcome

dbt project has been imported to a new documentation. Automatic data lineage was created. Image title.