Documenting dbt Cloud

Hubert Książek - Dataedo Team Hubert Książek 22nd December, 2022
Applies to: Dataedo 10.x versions, Article available also for: 24.x (current), 23.x
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

dbt tool that helps data teams to transform data in their warehouses by simply writing select statements - it enables analysts to work more like software engineers. dbt cloud is a hosted service that helps data analysts and engineers productionize dbt deployments. It comes equipped with turnkey support for scheduling jobs, CI/CD, serving documentation, monitoring, alerting, and an integrated development environment (IDE).

Supported elements and metadata

All objects from dbt will be imported as views in Dataedo, the type will determine the actual materialization. Dataedo creates an automatic data lineage between dbt objects (always) and sources (only if you connect the data warehouse used in this dbt project). We also support the not_null test - if you test a column in dbt to see if it is not_null, the Nullable marker in Dataedo will be unchecked and the unique test - the corresponding constraint will be added.

To have the automatic data lineage generated, use the notation from Jinja when you refer to other objects i.e. {{ source(source_name, table_name) }} or {{ ref(package_name, model_name) }}.
So only when you see a lineage between objects in the documentation generated by dbt, it will be imported into Dataedo.

Prerequisites

  • dbt Cloud Service Token with at least read-only permissions.
  • Run Id which contains the manifest.json file and for the best results catalog.json file as well.

Preparations

Creating a service token

In the upper right edge, click on the gear wheel, then select Account Settings.

Image title

On the left panel, select Service Token, choose + New Token

Image title

Name the token, add member permissions, select the projects you want to import, and click save.

Image title

Copy the token and save it in a safe place. Once you close it, you will no longer be able to see its value in dbt Cloud.

Image title

Create proper run and read run ID

Select Deploy, then Jobs.

Image title

In the upper right corner, click on Create Job.

Image title

Name the job, specify the environment. In the command field, write dbt docs generate. You can uncheck Run on schedule, and click save.

Image title

In the upper right corner, click on Run now.

Image title

When the run is over, click on it.

Image title

In the field marked with an arrow you will be able to read the Run Id - here it's 12345678 (without the hashtag). Save it because it will be necessary for the import.

Image title

Connect Dataedo to dbt project

To connect to the dbt project create new documentation by clicking Add and choosing Database connection.

Image title

Choose dbt (beta) from the list and click Next >:

Image title

Select dbt Cloud (beta) and click Next >:

Image title

Paste Service Token, click on three dots next to Account, and choose account. Paste Run Id and for best results, add a data warehouse already documented in Dataedo in which dbt was running by selecting it in the Database field. Click Connect.

Image title

In the next window, you will be asked what objects you want to be imported into Dataedo. If you will want to omit some objects in bulk, refer to the advanced import filter. After selecting the objects, click Next >.

Image title

Add a document title and click Import.

Image title

Outcome

dbt project has been imported to new documentation. Automatic data lineage was created.

Image title.