Snowflake - Automatic Data Lineage

Hubert Książek - Dataedo Team Hubert Książek 3rd June, 2024

What to Expect

Views/Materialized views

Data lineage for views/materialized views is created using our SQL Analyzer (Parser). Our analyzer creates column-level data lineage based on the view script. The data flow is created between Snowflake objects (tables, views, etc.) and view.

Data lineage for views

Dynamic tables

Data lineage for dynamic tables is created using our SQL Analyzer (Parser) and using Snowflake metadata (INFORMATION_SCHEMA.DYNAMIC_TABLE_GRAPH_HISTORY()).

  1. Our analyzer creates column-level data lineage based on the dynamic table script.
  2. Snowflake metadata provides only object-level data lineage.

The data flow is created between Snowflake objects (tables, views, etc.) and dynamic table.

Data lineage for dynamic tables

Because dynamic tables behave more like views than tables, their main type is View (with subtype Dynamic table), not Table. That's why they are displayed on screenshots as views.

Stages

Data lineage for stages is created using Snowflake metadata (INFORMATION_SCHEMA.STAGES). The data flow is created on object-level between external storage (like S3 or Azure Blob Storage) objects (CSV, JSON, Parquet files, folders, etc.) and Snowflake stage.

Data lineage for stages

External tables

Data lineage for external tables is created using Snowflake metadata (INFORMATION_SCHEMA.EXTERNAL_TABLES). The data flow is created on object-level between Snowflake stage and external table.

Data lineage for external tables

Snowpipes (Snowflake Data Pipelines)

Data lineage for Snowpipes is created using simple regex pattern matching on the Snowpipe script. The data flow is created on object-level between Snowflake stage and Snowpipe and then between Snowpipe and table: stage → snowpipe → table.

Data lineage for snowpipes

Direct loading

Dataedo supports data lineage created by directly loading data into tables from stages using the COPY INTO command. The data flow is created based on Snowflake metadata (ACCOUNT_USAGE.COPY_HISTORY - requires requires elevated permissions). The data flow is created on object-level between Snowflake stage and table with SQL Script as the processor.

Data lineage for manual COPY INTO

Known limitations

  1. Check the limitations for views lineage from SQL parser.
  2. With more complex snowpipes copy statements, the data lineage may not be created correctly.
  3. Snowpipes and stages regex patterns are ignored when lineage is created.
  4. Lineage from snowpipes with transformations is not supported.

Troubleshooting

Views/Materialized views

  1. Make sure the view script was imported correctly and is valid (e.g. referenced objects exist).
  2. Check if the view script contains any unsupported SQL syntax - parser documentation.
  3. Make sure that dialect is set correctly and "Automatic lineage" and "Parse SQL" checkboxes are checked in the Snowflake import window.
  4. Try to reparse the view script. You can do it by clicking the "Parse Script to Lineage" button or by rerunning the import process.

Dynamic tables

  1. Make sure the dynamic table script was imported correctly and is valid (e.g. referenced objects exist).
  2. Check if the dynamic table script contains any unsupported SQL syntax - parser documentation.
  3. Make sure that dialect is set correctly and "Automatic lineage" and "Parse SQL" checkboxes are checked in the Snowflake import window.
  4. Verify in Snowflake if SELECT * FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_GRAPH_HISTORY()) returns any results using the same user, role, and database as provided in the Dataedo.
  5. Try to reparse the dynamic table script. You can do it by clicking the "Parse Script to Lineage" button or by rerunning the import process.

Stages

  1. Make sure that referenced storage objects are imported to Dataedo and contain valid paths.
  2. Make sure the "Automatic lineage" checkbox is checked in the Snowflake import window.
  3. Verify in Snowflake if SELECT * FROM INFORMATION_SCHEMA.STAGES returns any results using the same user, role, and database as provided in the Dataedo.
  4. Try rerunning the import process (especially if referenced storage objects were added after the last import).

External tables

  1. Make sure the "Automatic lineage" checkbox is checked in the Snowflake import window.
  2. Verify in Snowflake if SELECT * FROM INFORMATION_SCHEMA.EXTERNAL_TABLES returns any results using the same user, role, and database as provided in the Dataedo.
  3. Try rerunning the import process.

Snowpipes (Snowflake Data Pipelines)

  1. Make sure the Snowpipe script was imported correctly and is valid (e.g. referenced objects exist).
  2. Make sure the "Automatic lineage" checkbox is checked in the Snowflake import window.
  3. Verify if the Snowpipe script contains transformations (should not contain any SELECT).
  4. Try rerunning the import process.

Direct loading

  1. Make sure the "Automatic lineage" and "Copy history" checkboxes are checked in the Snowflake import window.
  2. Verify in Snowflake if SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.COPY_HISTORY returns any results using the same user, role, and database as provided in the Dataedo. SNOWFLAKE.ACCOUNT_USAGE requires elevated permissions.
  3. Try rerunning the import process.
Found issue with this article? Comment below
Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
0
There are no comments. Click here to write the first comment.