What to Expect
Dataedo imports automatically:
- Pipelines
- Activities (as data lineage processes in Dataedo)
- Datasets
- Sources of copy activity
- Sinks of copy activity as Destinations in Dataedo
- Column-level lineage for copy activity
Due to the specifics of ADF operation, lineage does not pass through datasets; instead, we generate Sources and Destinations based on datasets and their parameters, as a single dataset can be used multiple times when parameterized.
Parametrized linked services, datasets and pipelines
Lineage for parameterized linked sources, datasets, and pipelines can generated based on Pipeline Runs and Activity Runs logs. To enable it, you need to check Analyze Pipeline Runs checkbox and specify how many recent days and how many last pipeline executions should be analyzed.
If in parameters there are Azure Expressions interpreted values are not stored in logs. We are recreating entire flow and parameter values assignements.
While interpreting Azure Expressions Dataedo supports functions:
- concat
- toLower
- utcnow
Examples of supported expressions:
@concat('SELECT column_names FROM dbo.salesforce_column_names WHERE name = ''', pipeline().parameters.ObjectName, '''')
@concat('Copy_', item().TargetTable)
@concat('sales/', toLower(item().name), '/', utcNow('yyyy-MM-dd'), '/')
SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]
Supported linked services
Lineage of data source types below are linked automatically:
- Microsoft SQL Server
- Azure SQL Server
- Azure Synapse Analytics
- Azure Blob Storage
- Azure Data Lake Storage
- Amazon S3,
- Postgres
- MariaDB
- MySQL
- MongoDB
- Postgres
- Snowflake (linked source i Dataedo needs to be linked to source database in repository manually and next import changes will detect column level lineage to tables)
- Salesforce (linked source i Dataedo needs to be linked to source database in repository manually and next import changes will detect column level lineage to objects)
- REST (object level lineage)
Copy Activity
Dataedo supports lineage for Table, Query and Stored Procedure source type.
Pipeline triggers an Execute Pipeline activity, which runs a Copy Activity within the ForEach activity loop scenario
In example below we have two pipelines. On the first screen pipeline uses Lookup activity to get table names and schemas from database table and runs Execute Pipeline and passes Table Schema and Table Name parameters array to invoked pipeline. On the second screen we see invoked pipeline which gets Table Schema and Table Name parameters array and runs Copy Activity with dynamic parameters inside ForEach loop activity.
When import is ran with Analyze Pipeline Runs parameter checked and there are successfull runs of those pipelines in Pipeline and Activities runs log inside the second pipeline which actually copies data in Dataedo there will be end-to-end column-level lineage:
We can also view the lineage for a specific iteration:
Known Limitations
- When a ForEach loop is used in a pipeline, we only support lineage for loops that run sequentially. Due to inconsistencies in Azure Data Factory Pipeline and Activity Runs logs, lineage for parallel runs is not supported.
- In Azure Expressions only expressions with return type String are supported.
- In Linked Services, Datasets, Pipelines parameters only the String type is supported.
- In Azure Expressions only concat, utcNow and toLower functions are supported.
Troubleshooting
Lineage is created based on a matched Linked Source, check the Dataedo desktop to make sure it is matched properly.