Share via

Do custom trainable classifiers in Purview analyze document metadata fields, or just document content?

sysadmin 140 Reputation points
2025-10-06T22:46:52.3966667+00:00

I am looking at setting up a custom Purview trainable classifier.

This requires me to provide 500+ samples of positive training documents.

What I want to know is, should I be ensuring the documents have exemplar values in metadata fields, by which I mean title, author/creator/person fields, date fields - or are these just ignored by the machine learning training process?

What about when it actually comes time to categorize the documents? If I have two identical documents but the authors are different, would it be reasonable for me to expect Purview to potentially categorize them differently, or is that beyond its capabilities?

(please no automatically generated AI responses, or if you must, please check that what it spat out is actually true)

Microsoft Security | Microsoft Purview
0 comments No comments

Answer accepted by question author

Smaran Thoomu 35,125 Reputation points Microsoft External Staff Moderator
2025-10-07T13:08:56.1333333+00:00

Hi sysadmin

That’s a great question - and it’s an important distinction when preparing training data for custom trainable classifiers in Microsoft Purview.

Currently, custom trainable classifiers analyze the document content itself, not metadata fields such as title, author, creator, or date. The model focuses on linguistic and semantic patterns within the body text of your documents to learn how to recognize the target category.

Metadata fields are not evaluated during either:

Training (when you upload your positive and negative samples), or

Classification (when Purview applies the trained model to your content).

So, you don’t need to include or modify metadata values for the purpose of training. Even if two documents have identical content but different authors, Purview would treat them as effectively the same from a classification perspective.

If metadata-based logic is important for your scenario (for example, classification depending on author or department), you could combine trainable classifiers with sensitive information types or auto-labeling policies that apply based on metadata-driven conditions - but that would be outside of the classifier’s learning process.

Reference:

Hope this helps. Do let us know if you any further queries.


If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Was this answer helpful?

0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.