Azure Synapse Pipelines is a cloud-based data integration and orchestration service provided by Microsoft Azure. It allows you to create, schedule, and manage data workflows, as well as move and transform data across various sources and destinations.
To obtain a complete lineage for your Azure Synapse Analytics environment, we highly recommend to import data into Dataedo using both the Azure Synapse Analytics connector and the Azure Synapse Pipelines connector.
Supported elements and metadata
Objects imported
- Pipelines as ETL Programs
- Name
- Folder (as schema)
- Activities as Processes in Data Lineage
- Name
- Data lineage (object and column level, details below)
- Integration Datasets as Datasets
- Name
- Folder (as schema)
- Columns
- Sources
- Name
- Columns (if linked service is imported to Dataedo)
- Destinations
- Name
- Columns (if linked service is imported to Dataedo)
Dataedo imports all the activities as data processes within pipelines. Nested activities like ForEach, Until, IfCondition, and Switch are also imported with all the activities inside them.
Sources and Destinations are created based on Dataset information and runtime logs information (for parametrized datasets and linked sources) for presenting lineage purposes.
Activites we build automatic data-lineage for
- Copy activity - Object-level and column-level lineage
- Dataflow - Object-level lineage
Automatic data lineage
Dataedo Azure Synapse Pipelines connector always creates automatic data lineage dataset -> task -> dataset. We also support source -> dataset -> task -> dataset -> sink for following linked service types:
- Microsoft SQL Server
- Azure SQL Server
- Azure Synapse Analytics
- Azure Blob Storage
- Azure Data Lake Storage
- Amazon S3,
- Postgres
- Redshift
- MariaDB
- MySQL
- MongoDB
- Snowflake
- DB2
- Postgres
Connecting to Azure Synapse
Permissions
To list resource groups in a connection window you must belong to the Azure Synapse Contributor Role role at the Resource Group level or above.
To run import you need just read access to the workspace you want to document. However in this case you must provide the resource group and workspace name by hand instead of picking from the list.
Add new connection
To connect to Azure Synapse Pipelines and create new documentation click Add documentation and choose Database connection.
On the connection screen choose Azure Synapse Pipelines as DBMS.
Connection details
- Subscription - Azure subscription assigned to your workspace.
- Resource group - Resource group where your workspace sits in.
- Workspace - Name of the workspace you want to extract metadata from.
Go to portal.azure.com . Search for your Azure Synapse Analytics workspace name and open this resource to get information needed.
Importing objects
When the connection was successful Dataedo will read objects and show a list of objects found. You can choose which objects to import. You can also use an advanced filter to narrow down the list of objects.
Confirm list of objects to import by clicking Next.
The next screen allows you to change the default name of the documentation under your schema will be visible in the Dataedo repository.
Click Import to start the import.
When done close import window with Finish button.
Outcome
Your Azure Synapse Analytics objects has been imported to new documentation in the repository.