Share via

Trainable Classifier – “Insufficient Samples” error while creating classifier for AI-generated Offer Letters for testing purpose

GARRE AKHIL 45 Reputation points
2026-03-16T11:17:22.8333333+00:00

Hi Community,

I am trying to create a Trainable Classifier in Microsoft Purview to detect and test the Offer Letters .

Objective: We are using AI tools to generate bulk employee offer letters, and I want Microsoft Purview to automatically classify these documents as Offer Letter so that we can apply compliance policies such as DLP, retention, or auto-labeling.

Issue: While creating the classifier and providing seed content, I keep getting the error “Insufficient samples” even after uploading multiple documents.

What I have tried so far:

  • Created a new Trainable Classifier in Microsoft Purview.

Uploaded offer letter documents as positive seed content.

Referred to Microsoft documentation for trainable classifiers.

Uploaded multiple samples, but the system still reports insufficient samples.

My Questions:

Does Microsoft Purview require 50–150 unique documents for positive seed content for a trainable classifier?

Do these documents need to be structurally different templates, or can they be similar offer letters with different employee details?

Since these offer letters are AI-generated, will having similar formatting affect the classifier training?

Is there a recommended minimum number of negative samples that should also be uploaded?

Goal: I want Purview to reliably detect Offer Letters generated through AI tools so that compliance policies can automatically apply to them.

If anyone has implemented a trainable classifier for HR documents such as offer letters, I would appreciate guidance on:

Required sample size

Best practices for seed content preparation

Any common reasons for the “insufficient samples” error

Thanks in advance for your support.Hi Community,

I am trying to create a Trainable Classifier in Microsoft Purview to detect Offer Letters in our tenant.

Objective:
We are using AI tools to generate employee offer letters, and I want Microsoft Purview to automatically classify these documents as Offer Letter so that we can apply compliance policies such as DLP, retention, or auto-labeling.

Issue:
While creating the classifier and providing seed content, I keep getting the error “Insufficient samples” even after uploading multiple documents.

What I have tried so far:

Created a new Trainable Classifier in Microsoft Purview.

Uploaded offer letter documents as positive seed content.

Referred to Microsoft documentation for trainable classifiers.

Uploaded multiple samples, but the system still reports insufficient samples.

My Questions:

Does Microsoft Purview require 50–150 unique documents for positive seed content for a trainable classifier?

Do these documents need to be structurally different templates, or can they be similar offer letters with different employee details?

Since these offer letters are AI-generated, will having similar formatting affect the classifier training?

Is there a recommended minimum number of negative samples that should also be uploaded?

Goal:
I want Purview to reliably detect Offer Letters generated through AI tools so that compliance policies can automatically apply to them.

If anyone has implemented a trainable classifier for HR documents such as offer letters, I would appreciate guidance on:

Best practices for seed content preparation

Any common reasons for the “insufficient samples” error

Thanks in advance for your support.

Microsoft Security | Microsoft Purview

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 35,375 Reputation points Microsoft External Staff Moderator
    2026-03-17T06:44:32.0633333+00:00

    @GARRE AKHIL Hey Garre Akhil, it sounds like you’re bumping into the built-in seeding requirements for Purview trainable classifiers. Here’s the quick gist and some tips to get past the “Insufficient samples” error:

    1. Minimum seed-data requirements • Positive samples: you need at least 50 unique docs (up to 500). • Negative samples: you need at least 150 (up to 1,500). Microsoft recommends a 1:3 ratio (for every positive, three negatives) for best results.
    2. Unique vs. template docs • You can use the same offer-letter template with different employee data, but make sure each file has enough unique content (e.g. >200 words, different names, dates, terms). • Avoid uploading 10 identical docs—Purview looks for distinct examples to build its model.
    3. Formatting consistency • Having the same header/footer or layout is OK as long as the body text varies meaningfully. • Very short or boilerplate-only files may be skipped or deemed “insufficient.”
    4. SharePoint folder setup & indexing • Put positives in one SharePoint folder and negatives in another (no Teams folders). • Wait ~1 hour after creating those folders so SharePoint can index them before you point your classifier at them.
    5. Processing time • After you submit your seed locations, Purview takes up to 24 hrs to ingest and test the samples. • During that time the status shows “In progress.” Only once it’s finished will you be allowed to publish.
    6. Double-check file support • Docs >20 MB only get metadata pulled, so keep individual files under that. • Purview supports Office formats (DOCX, PDF, etc.) for trainable classifiers.

    Once you’ve hit the 50/150 threshold with sufficiently distinct content and let the system process your folders, the “Insufficient samples” error should go away. After training completes, test your classifier against known positives/negatives, give feedback on any mis-classifications, then publish it to your DLP, retention or auto-labeling policies.

    Reference docs

    • Get started with trainable classifiers (seed counts, workflow):

    https://learn.microsoft.com/purview/trainable-classifiers-get-started-with

    • Learn about trainable classifiers (positive vs. negative sets):

    https://learn.microsoft.com/purview/trainable-classifiers-learn-about#custom-classifiers

    • Classification best practices (supported file types, size limits):

    https://learn.microsoft.com/azure/purview/concept-best-practices-classification#classification-considerations

    Hope this helps you get past “Insufficient samples” and into a reliably trained Offer Letter model! Let me know if you hit any other snags.

    Note: This content was drafted with the help of an AI system. Please verify the information before relying on it for decision-making.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.