dbt Core™ is an open source command line tool that enables data teams to transform data in their warehouses by simply writing select statements - it enables that analysts to work more like software engineers. Dataedo supports documenting dbt projects which you can access locally. In future releases support for connecting to dbt Cloud™ will be added.
Supported metadata and data lineage
Imported metadata
Imported | Editable | |
---|---|---|
Tables, Views, Materialized Views, Incremental, Ephemeral |
✅ | ✅ |
Table comments | ✅ | ✅ |
Columns | ✅ | ✅ |
Data types | ✅ | ✅ |
Nullability (from not_null test) |
✅ | |
Foreign keys (from relationships test) |
✅ | ✅ |
Unique keys (from unique test) |
✅ | ✅ |
Column comments | ✅ | ✅ |
All objects from dbt will be displayed as views in Dataedo, the subtype will determine the actual materialization.
Data Lineage
Source | Method | Status |
---|---|---|
dbt object - dbt object | From Manifest.json |
✅ |
database object - dbt object | From Manifest.json |
✅ |
Dataedo creates an automatic data lineage for you between dbt objects (always). Additionally, it creates a lineage between a dbt object and its related database object (only if you connect the data warehouse used in this dbt project).
{{ source(source_name, table_name) }}
or {{ ref(package_name, model_name) }}
. So only when you see a lineage between objects in the documentation generated by dbt, it will be imported into Dataedo.
Prerequisites
You need to have access to the folder on your computer where the dbt project is located. The folder must contain the dbt_project.yml
file in which the correct target-path
name is specified (target
by default). The folder to which the target-path
leads must contain a catalog.json
and manifest.json
file (as a last resort, manifest.json
alone will suffice).
Creating a catalog.json
file
The catalog.json file is created after executing the dbt docs generate command. Open the terminal, navigate to the folder where the dbt project is located and execute the dbt docs generate
command. If everything is properly configured the catalog.json
file should generate. If you have problems, refer to the dbt documentation.
Connect Dataedo to dbt project
To connect to dbt project create new documentation by clicking Add documentation and choosing Database connection.
On the Add documentation window choose dbt (beta) and click Next >:
On the next screen, click on the three dots next to the path field, then find the dbt project folder (it must contain the dbt_project.yml file), click on it and confirm with Ok.
For best results, add a data warehouse already documented in Dataedo in which dbt was running. To do this, click on the dropdown next to the Database (optional) field and select the database in which dbt was running.
Click on the Connect button at the bottom right of the window.
In the next window you will be asked what objects you want to import into Dataedo. If you will want to omit some objects in bulk refer to the advanced import filter. After selecting the objects to be imported, click Next >.
Add a document title and click Import.
Outcome
dbt project has been imported to a new documentation. Automatic data lineage was created.
.