Synapse Lineage not Mapped on Existing Data Assets in Purview

JF 46 Reputation points
2023-10-18T07:56:32.6833333+00:00

Hello,

Since a few weeks, there has been an issue with the automatic mapping of Synapse Analytics pipelines and dataflows for lineage on existing data assets of Gen 2 Data Lake.

I have several custom pattern rules to overwrite the default behavior of Purview to recognize resource sets. When I apply a scan on the Gen 2 Data Lake, the parquet resource sets are correctly identified. Since recently, the scan also detects parquet folders which was not the case before. But anyways, the scan provides the data assets correctly as intended.

Normally, the lineage information from Synapse of Dataflows and Pipelines should be automatically mapped on these data assets. This worked well before, but is now not the case anymore. Now, a lot of duplicate Gen 2 data assets are created under the root folder in Purview. This should not be the case as Purview should recognize that these assets already exist since they were scanned before.

Has things changed in the lineage mechanism between Purview and Synapse? Has this something to do with the change of Azure Purview to Microsoft Purview? I am also using the new Purview interface which is still in preview at the moment.

Thanks in advance!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,486 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
968 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 80,251 Reputation points Microsoft Employee
    2023-10-19T06:37:43.35+00:00

    @JF - Thanks for the question and using MS Q&A platform.

    It seems like you are experiencing an issue with the automatic mapping of Synapse Analytics pipelines and dataflows for lineage on existing data assets of Gen 2 Data Lake. I understand that you have several custom pattern rules to overwrite the default behavior of Purview to recognize resource sets and when you apply a scan on the Gen 2 Data Lake, the parquet resource sets are correctly identified. However, the lineage information from Synapse of Dataflows and Pipelines is not automatically mapped on these data assets and a lot of duplicate Gen 2 data assets are created under the root folder in Purview.

    To answer your questions, there have been no changes in the lineage mechanism between Purview and Synapse that could cause this issue. It is possible that the change from Azure Purview to Microsoft Purview could have caused some changes, but I cannot say for sure without more information.

    To troubleshoot this issue, I would recommend checking the following:

    1. Check if the data assets that are not being mapped already exist in the data map. If they do, then the lineage information from Synapse should be automatically added to them. If they don't, then Purview should create new data assets for them.
    2. Check if the data assets that are being created as duplicates have the same name and path as the existing data assets. If they do, then Purview might be creating new data assets instead of mapping the lineage information to the existing ones..

    If none of these steps help resolve the issue, I would recommend to open a support ticket for further assistance. They should be able to help you troubleshoot the issue and provide a solution.

    0 comments No comments