Share via

【Purview】Asset update in big data volum

Mofei Zhuang 140 Reputation points Microsoft Employee
2026-02-25T10:23:03.94+00:00

I currently have a scenario when using Purview.

We expect to have around 20 million assets in a single Purview account in the future. We will update the data daily through a job, but right now I see that Purview does not seem to be able to detect which data has changed.

That means, for example, if I have 20 million assets, in the worst case I would need to update all 20 million assets every day. Even if I update them in batches at a rate of 500 assets per second, it would still take about 11 hours. This is unacceptable, and it would also put a lot of pressure on Purview, as well as on the index updates.

I’d like to ask if there is a good approach or best practice for this scenario.

Microsoft Security | Microsoft Purview
0 comments No comments
{count} votes

Answer accepted by question author
  1. Pilladi Padma Sai Manisha 4,910 Reputation points Microsoft External Staff Moderator
    2026-02-26T09:52:25.66+00:00

    Hi Mofei Zhuang
    Thankyou for reaching Microsoft Q&A!

    Doing a full daily refresh for millions of assets is not scalable in Microsoft Purview. Purview does not automatically detect deltas unless the ingestion process sends only incremental changes. If all assets are republished, it triggers full re-indexing and increases processing time.

    The recommended approach is to implement change tracking at the source or ingestion layer and push only incremental updates based on last-modified timestamps, versioning, or events. Using stable IDs with upsert behavior ensures only changed assets are updated, which improves performance and scalability, especially for very large catalogs (for example, 20M+ assets).

    Microsoft Purview is a metadata governance and catalog service, not an OLAP or operational store. It manages metadata such as schemas, lineage, and classifications — not the actual data. That’s why incremental ingestion is strongly recommended instead of full refreshes.

    A side-car database and Purview serve different purposes. A side-car DB can support fast lookups and track changes for delta ingestion, while Purview provides centralized governance, lineage, search, and discovery across the organization. They complement each other rather than replace one another.

    Purview APIs are not designed for ultra-low latency (~10 ms) scenarios. For real-time or operational querying, a lightweight operational store (such as a side-car DB or Kusto) should be used, while Purview remains the governance layer.

    Currently, Purview does not provide a native bulk export or direct pipeline to sync the full Data Map to Kusto. The supported approach is to use Data Map REST/Search APIs and retrieve data incrementally using filters (such as updated time, entity type, or collection). If a secondary store is required, the most efficient design is to write metadata to both systems during ingestion rather than exporting everything from Purview later.

    Although Purview’s metadata model is derived from Apache Atlas, the underlying components (including Kafka/event streams) are fully managed and not exposed. There is no customer-accessible Kafka stream. Supported integrations must rely on public REST/Search APIs or capture changes during ingestion.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.