Azure purview lineage Azure Databricks

sakuraime 2,271 Reputation points
2021-11-24T14:07:24.96+00:00

If have a Azure databaricks notebook for like joining two parquet (from azure blob storage) and output to the original blob storage , can azure purview lineage detect that ? any example to achieve that ?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,206 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
428 questions
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 53,096 Reputation points Microsoft Employee
    2021-11-25T06:30:20.613+00:00

    Hello @sakuraime ,

    (UPDATE:25/07/2022): Microsoft Purview team released an open source solution accelerator to extract lineage from Databricks and ingest it into Microsoft Purview: microsoft/Purview-ADB-Lineage-Solution-Accelerator: A connector to ingest Azure Databricks lineage into Microsoft Purview (github.com)

    This solution accelerator, together with the OpenLineage project, provides a connector that will transfer lineage metadata from Spark operations in Azure Databricks to Microsoft Purview, allowing you to see a table-level lineage graph. It supports Delta, Azure SQL, Data Lake Gen 2, and more.

    --------------------------------------------------------------

    Thanks for the question and using MS Q&A platform.

    Unfortunately, Azure Purview lineage (Azure Databricks Notebooks) won't show up out of the box.

    I would suggest you to vote up an idea submitted by another Azure customer.

    https://feedback.azure.com/d365community/search/82d7bddb-fb24-ec11-b6e6-000d3a4f07b8?q=azure+databricks+notebook

    All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.

    Alternative option: Azure Purview uses Atlas behind the scenes, thus you can probably capture this lineage using the API.

    Here's an example of where Spline was used to track lineage from notebooks: https://intellishore.dk/data-lineage-from-databricks-to-azure-purview/

    This article talks about how to get started with the Purview REST API: https://techcommunity.microsoft.com/t5/azure-architecture-blog/exploring-purview-s-rest-api-with-python/ba-p/2208058

    Disclaimer: This response contains a reference to a third-party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

  2. isdataninja 171 Reputation points
    2021-12-03T17:43:57.53+00:00
    No comments