Hi @Piotr Tybulewicz
Thank you for contacting to Microsoft QA, below are the few mitigation steps that may help you to address the query -
The failures weren’t due to Purview’s handling of internal _materialization_mat* tables. The actual issue was resource constraints on the Kubernetes Integration Runtime (IR). When the node hit its CPU limit, it restarted mid-scan, causing the entire job to fail.
Why Managed IR Behaved Differently
Managed IR runs on Microsoft-hosted infrastructure with auto-scaling and sufficient resources, so transient query errors (like those on internal system tables) don’t cause the scan to fail. Kubernetes IR, on the other hand, is self-hosted—so if the node restarts due to resource exhaustion, the scan cannot recover and is marked as failed.
Approach
Increase Kubernetes Node Size
- Ensure the node has enough CPU and memory to handle Purview scan workloads.
- For Databricks Unity Catalog scans, consider sizing for peak usage rather than minimum specs.
Monitor Resource Utilization
- Use Kubernetes metrics or Azure Monitor to track CPU/memory during scans.
- Set alerts for resource saturation to prevent unexpected restarts.
Optional: Use Managed IR for Stability
- If scaling Kubernetes IR is not feasible, Managed IR is more resilient for large or complex scans.
About Internal Tables?
The _materialization_mat* tables are internal to Databricks and not accessible to user identities. Permission errors on these tables are expected and harmless. Purview ignores them when using Managed IR. With Kubernetes IR, once resource sizing is fixed, these errors will no longer cause the scan to fail.
Guidance
Customers do not need to avoid Kubernetes IR for Unity Catalog scans. Just make sure:
- Nodes are properly sized.
- Resource monitoring is in place.
- Understand that permission errors on internal tables are normal and can be safely ignored.