A unified data governance solution that helps manage, protect, and discover data across your organization
Hi sysadmin
That’s a great question - and it’s an important distinction when preparing training data for custom trainable classifiers in Microsoft Purview.
Currently, custom trainable classifiers analyze the document content itself, not metadata fields such as title, author, creator, or date. The model focuses on linguistic and semantic patterns within the body text of your documents to learn how to recognize the target category.
Metadata fields are not evaluated during either:
Training (when you upload your positive and negative samples), or
Classification (when Purview applies the trained model to your content).
So, you don’t need to include or modify metadata values for the purpose of training. Even if two documents have identical content but different authors, Purview would treat them as effectively the same from a classification perspective.
If metadata-based logic is important for your scenario (for example, classification depending on author or department), you could combine trainable classifiers with sensitive information types or auto-labeling policies that apply based on metadata-driven conditions - but that would be outside of the classifier’s learning process.
Reference:
- https://learn.microsoft.com/en-us/purview/data-classification-overview
- https://learn.microsoft.com/en-us/purview/trainable-classifiers-definitions
- https://learn.microsoft.com/en-us/purview/trainable-classifiers-learn-about
- https://learn.microsoft.com/en-us/purview/trainable-classifiers-get-started-with
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.