Share via

Query on Databricks Notebook IDs in Purview Lineage

Mohammed Aamer 160 Reputation points
2026-02-17T07:37:41.1933333+00:00

Hi Team,

I am working on Microsoft Purview to scan Azure Databricks Unity Catalog for table and column lineage. In our project, Databricks notebooks are used for transformations and ADF is used for orchestration. We have 15 Databricks notebooks handling all transformations, and the jobs have been running in production for several months. We are performing incremental scans in Purview (not fresh scans).

While reviewing lineage in Purview, I noticed that although the same notebooks are used, there are ~200+ notebook IDs appearing in lineage.

From a data governance perspective, we want to add asset descriptions to these notebook nodes. However, if the notebook ID keeps changing, the descriptions entered earlier become obsolete.

Could you please clarify:

  1. Is our observation correct that a new notebook ID is generated for each run/execution of a Databricks notebook in lineage?
  2. If yes, is there any way to map or anchor these run-based notebook IDs to a stable notebook asset description (e.g., notebook path/name) so governance metadata like descriptions can be maintained consistently?

Please see below screenshots for your reference.
Lineage

Asset Desc

Regards,
Aamer

Microsoft Security | Microsoft Purview
0 comments No comments
{count} votes

Answer accepted by question author
  1. Smaran Thoomu 33,840 Reputation points Microsoft External Staff Moderator
    2026-02-17T07:47:10.2166667+00:00

    Hi Mohammed Aamer
    Good observation - what you’re seeing is expected with how Databricks lineage is captured today.

    In Purview, the notebook nodes that appear in lineage are often tied to the execution/run context, not just the logical notebook file. So when the same notebook runs multiple times (for example through ADF orchestration), Purview can show multiple notebook assets with different IDs. These IDs are system-generated and can look like new notebooks even though the source notebook is the same.

    So yes, it can result in many notebook IDs over time, especially in active production environments.

    Right now, there isn’t a way to “anchor” all those run-based IDs to one stable notebook asset in Purview. The lineage is reflecting execution history rather than a single static notebook object.

    For governance, most teams handle this by:

    • Using the notebook path/name in Databricks as the main reference
    • Adding descriptions at the table or data asset level instead of each notebook node
    • Treating notebook nodes more as technical lineage artifacts than governed assets

    This is a current product behavior rather than a scan issue, and incremental vs full scans don’t change it much.

    If Microsoft improves notebook normalization in lineage in the future, this should become easier to manage.

    Hope this clarifies what you’re seeing.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.