Documenting lineage for SQL Server Integration Services (SSIS)

Applies to: Dataedo 23.x versions, Article available also for: 24.x (current), 10.x
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

SQL Server Integration Services (SSIS) is a Microsoft's ETL platform for building data transformations and data ingrations solutions. Dataedo supports documenting the SSIS packages contained in ISPAC deployment project files, DTSX package files, and connecting to a SSIS server.

Supported elements and metadata

Packages will be imported as Packages in Dataedo. Tasks within a package will be imported as data processes in Data Lineage, so that the data lineage can be easily extended with data flows.

Image title

Sources and Destinations types of objects from SSIS which are in Data Flow are imported into dataedo as Sources and Destinations.

Image title

If the source of the data is an SQL query then Dataedo parses that query, and based on that parsing a lineage is created.

Image title

Connections that are in Connection manager in SSIS Package will be imported into Dataedo as Linked Sources.

Linked source contains information about the connection such as server, host, etc. If Dataedo detects that the database to which the connection relates was previously documented, it will assign the matching database to the Linked source. More about Linked Sources in this article

Dataedo imports project connections and package connections

Image title

Automatic data lineage

Before documenting SSIS project/package make sure that all of the data sources and destinations are imported as data sources in Dataedo. Furhtermore, the data source and destination should be created on package level. If all of the prerequisites are met, Dataedo will automatically create Data Lineage for following tasks (Control Flow):

  • Data Flow Task (source and destination)
  • Bulk Insert Task (destination)
  • Execute SQL Task
  • FTP Task

The data lineage for Sources and Destinations is also predicted when data sources are documented in Dataedo, and is created in SSIS by these objects from the Toolbox:

  • ADO NET Source
  • Flat File Source
  • OLE DB Source
  • ODBC Source

  • ADO NET Destination

  • Flat File Destination
  • OLE DB Destination
  • SQL Server Destination

Automatic column data lineage

This type of lineage is provided for specific objects. At the Process level in Dataedo, a column lineage is created when a process contains these objects from the SSIS Toolbox:

  • Balanced Data Distributor
  • Conditional Split
  • Lookup
  • Multicast
  • Catche Transform
  • Export Column
  • Procentage Sampling

Column lineage, but not for columns newly created in the process:

  • Data Conversion
  • Derived Column Script Component
  • Character Map
  • Copy Column
  • Fuzzy Grouping

There is no column lineage, but there is lineage to the table:

  • Aggregate
  • Sort
  • Union
  • Pivot
  • Row Sampling
  • Unpivot

At the Sources and Destinations object level in Dataedo, the column lineage is also available.

Image title

Connect Dataedo to SQL Server Integration Services SSIS

To connect to SSIS create new documentation by clicking Add and choosing New connection.

Image title

On the Add documentation window choose SQL Server Integration Services (SSIS):

add-documentation

On the next screen select Import Type:

  • ISPAC project deployment file - if you want to document all packages in a project file exported from SSIS SQL Server instance.
  • DTSX package file - if you want to document just one package, either extracted from ISPAC file, or created in Visual Studio.
  • SSIS Server - this option allows you to connect to a SQL Server instance holding an SSIS project.

ISPAC project deployment file and DTSX package file import

For first two options select the path of the appropriate file by clicking Browse button.

select-file

Click Connect. Dataedo will now read all the packages and will list them as Package type. Select packages to import and click Next. Set the Documentation Title and click Import.

SSIS Server import

For SSIS Server option you will need to fill in connection details:

ssis-server-conn

SSIS Server connection details

  • Server name - SQL Server instance where SSIS is stored, e.g. localhost.
  • Port - port under which SQL Server is available. In most cases you can leave default value of 1433
  • Authentication - select authentication type for connection. These are same as authentication types for SQL Server. For selected authentication type you will need to fill in appropriate information:
    • Windows Authentication - Dataedo will use your Windows account, hence no further details are needed
    • SQL Server Authentication - user and password
    • Azure Active Directory - Password - Azure AD user and password
    • Azure Active Directory - Integrated - Azure AD login associated with Windows account
    • Azure Active Directory - Universal with MFA - two-step authentication where you are redirected to Microsoft MFA login webpage.
  • Database - select SSISDB (Catalog Database) from list or type its' name
  • Folder - select Folder in which project is stored
  • Project - select Project which is to be documented

How to export an ISPAC project file

Connect to a SQL Server instance containing the SSIS project you want to document with SQL Server Management Studio. Then expand Integration Services Catalog, look for the project you want to document (on screenshot it's Daily ETL). Then right-click on the project and choose Export.

Export ispac

The windows explorer window will appear. Choose a location for the file and save.

Outcome

Image title

Specification

Imported metadata

Dataedo reads following metadata from SSIS.

Imported Editable
Packages
  Control Flows as Processes
   Column data lineage
   Process description
   Process script
  Variables
  Script
  Description
Sources and Destinations
  Data Lineage
   Column data lineage
   Process description
   Process script
  Script
  Description
Shared metadata
  Dependencies
  Created time
  Last updated time