SQL Parsing Overview

Hubert Książek - Dataedo Team Hubert Książek 4th May, 2023

A SQL parser is a tool that reads SQL code and converts it into a format that can be understood and processed by a computer. The parser is an important component of any metadata management system like Dataedo as it allows them to analyze SQL queries and fetch the relevant metadata.

This documentation will provide you with information on where the Dataedo SQL parser is used.

Import schema from DDL

Data Definition Language (DDL) is a subset of SQL that is used to describe database structure and relationships in a database. With the help of an SQL Parser, Dataedo imports the schema based on the given DDL. Read more.

Import metadata from DDL

Supported dialects

This function is available in the following dialects:

Automatic column-level lineage from SELECT/CREATE VIEW statements

Dataedo will create column-level data lineage for views and SQL queries based on the script. Just leave the "Automatic lineage" option selected during import.

How to enable automatic data lineage from Dataedo during import

If you disable this feature during import, you will still be able to trigger the automated creation of column-level lineage through the "Parse Script to Lineage" functionality, which is located in the Script tab.

How to enable automatic data lineage from Dataedo after import using Parse script to lineage

This function also works for manually added views that were created using a script, as well as for SQL Query objects.

Supported dialects

This function is available in the following dialects:

Supported SQL syntax

We support the creation of an automatic lineage in all dialects for the following SQL syntax:

  • SELECT and CREATE VIEW statements,
  • Aliases,
  • Functions and operators,
  • UNIONs,
  • JOINs,
  • Subqueries and nested queries,
  • CASEs and other logic-related expressions,
  • Not Recursive Common Table Expressions (CTE).

If any dialect supports additional SQL syntax there will be information about it on its subpage.

We do not support the creation of an automatic lineage for the following SQL syntax:

  • Statements that access nested data such JSON, XML, or objects as columns
  • PIVOT and UNPIVOT relational operators,
  • Recursive Common Table Expressions (Recursive CTE),
  • Dynamic queries,
  • Window functions.

Automatic column-level lineage from imported Stored Procedures

Dataedo will create column-level data lineage for stored procedures based on the script. Just leave the "Automatic lineage" option selected during import or doing import changes, it is not possible yet to parse a single procedure or manual script beyond the mentioned operations.

Dataedo will divide the procedure into steps and represent them as processes, then create a data lineage for supported steps. Unsupported steps will be named after the first word from the process script and end with three dots.

Column-level data lineage from stored procedures

In the script tab, the processes contain fragments of the procedure script to which they refer.

Script of process definition from a stored procedure

Supported dialects

This function is available in the following dialects:

We are working on support for data lineage from stored procedures for other dialects.

Supported SQL syntax

We support the creation of an automatic lineage in all dialects for the following SQL statements:

  • SELECT INTO,
  • INSERT (without inserting constant values),
  • UPDATE (without updating to constant values),
  • CREATE TABLE AS SELECT.

When temporary tables are created in a procedure (ones that do not normally exist in the documented database), Dataedo will try to combine the lineage that goes into them with the lineage that comes out of them, i.e. instead of SOURCE_TABLE -> #TEMP_TABLE -> DESTINATION_TABLE, will represent it as SOURCE_TABLE -> DESTINATION_TABLE.

Regrettably, due to the nature of ANTLR, the presence of unsupported elements within a procedure may result in the entire procedure being unable to be parsed. In the current stage, we treat procedures incorporating elements beyond those explicitly outlined as unsupported. Nevertheless, we do our best to make sure that the unsupported elements do not affect supported steps.

We do not support the creation of an automatic lineage for the following SQL syntax:

  • Common Table Expressions (WITH statements),
  • variables and parameters,
  • MERGE statements,
  • OUTPUT statements,
  • INSERT VALUES,
  • UPDATE when constants are set,
  • DELETE statements,
  • BULK INSERT statements,
  • COPY INTO statements,
  • syntax mentioned as unsupported in SELECT/CREATE VIEW chapter.

Extract information from PL/SQL packages

We use our SQL parser to break PL/SQL packages into individual objects - functions and stored procedures with parameters and scripts.

Extract information from Oracle packages

Additional information

Parser timeout

In Local user settings you can set the maximum time how much parsing one script can take. If it is exceeded, the parsing of this script will be automatically canceled.

Timeout for parser

Found issue with this article? Comment below
Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
0
There are no comments. Click here to write the first comment.