Create SIT that will scan metadata (the /docProps/core.xml)

Bryan Schwering 20 Reputation points
2025-10-08T01:20:08.9233333+00:00

I have data stored within a custom metadata field called Tags that I'd like Purview to scan. This data is accessible in the /docProps/core.xml file. However, I am unable to create a Purview SIT that will scan the data stored in the /docProps/core.xml file of the office document. Is there some way to do this?

Thanks,

Bryan

Microsoft Security | Microsoft Purview
0 comments No comments
{count} votes

Answer accepted by question author
  1. Pratyush Vashistha 4,575 Reputation points Microsoft External Staff Moderator
    2025-10-08T09:49:06.5066667+00:00

    Hello Bryan Schwering

    Thank you for your question on Microsoft Q&A!

    What I have understood from your question is that you're trying to create a Microsoft Purview sensitivity information type (SIT) that can scan custom metadata tags stored in the /docProps/core.xml file inside Office documents, but you're unable to get Purview to recognize or scan that content. Let me know if my understanding is correct.

    Currently, Microsoft Purview’s built-in classifiers and SITs are designed to scan the main document content (e.g., body text in Word, cells in Excel) and some standard document properties—but they do not parse or extract data from internal package files like /docProps/core.xml, which contains core metadata such as title, subject, or custom tags added via Office applications.

    To help clarify your scenario further:

    • Are these custom tags added via the Office UI (e.g., under File > Info > Properties > Advanced Properties > Custom tab), or are they embedded programmatically into the document package?
    • Have you confirmed that the tags appear in the document’s visible metadata when viewed in Windows File Explorer or through PowerShell/Office APIs?
    • Are you using Microsoft Purview Information Protection (for labeling/classification) or Microsoft Purview Data Map (for scanning data sources like SharePoint/OneDrive)?

    As of now, Purview does not support custom parsing of internal Office Open XML structure (like core.xml) during automated scans or classification. If your tags are stored as standard Office document properties (e.g., “Keywords” or “Category”), those may be detectable—but custom XML-level fields typically are not.

    For official context on what metadata Purview can classify, refer to: https://learn.microsoft.com/en-us/microsoft-365/compliance/sensitivity-labels-office-apps?view=o365-worldwide and https://learn.microsoft.com/en-us/purview/sensitivity-labels

    Let me know the answers to the questions above so I can better assess whether a workaround or alternative approach might be possible.

    User's image

    Please "Accept as Answer" if the answer provided is useful, so that you can help others in the community looking for remediation for similar issues.

    Thanks

    Pratyush


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.