A unified data governance solution that helps manage, protect, and discover data across your organization
Hi Mofei Zhuang
Thankyou for reaching Microsoft Q&A!
Doing a full daily refresh for millions of assets is not scalable in Microsoft Purview. Purview does not automatically detect deltas unless the ingestion process sends only incremental changes. If all assets are republished, it triggers full re-indexing and increases processing time.
The recommended approach is to implement change tracking at the source or ingestion layer and push only incremental updates based on last-modified timestamps, versioning, or events. Using stable IDs with upsert behavior ensures only changed assets are updated, which improves performance and scalability, especially for very large catalogs (for example, 20M+ assets).
Microsoft Purview is a metadata governance and catalog service, not an OLAP or operational store. It manages metadata such as schemas, lineage, and classifications — not the actual data. That’s why incremental ingestion is strongly recommended instead of full refreshes.
A side-car database and Purview serve different purposes. A side-car DB can support fast lookups and track changes for delta ingestion, while Purview provides centralized governance, lineage, search, and discovery across the organization. They complement each other rather than replace one another.
Purview APIs are not designed for ultra-low latency (~10 ms) scenarios. For real-time or operational querying, a lightweight operational store (such as a side-car DB or Kusto) should be used, while Purview remains the governance layer.
Currently, Purview does not provide a native bulk export or direct pipeline to sync the full Data Map to Kusto. The supported approach is to use Data Map REST/Search APIs and retrieve data incrementally using filters (such as updated time, entity type, or collection). If a secondary store is required, the most efficient design is to write metadata to both systems during ingestion rather than exporting everything from Purview later.
Although Purview’s metadata model is derived from Apache Atlas, the underlying components (including Kafka/event streams) are fully managed and not exposed. There is no customer-accessible Kafka stream. Supported integrations must rely on public REST/Search APIs or capture changes during ingestion.