A unified data governance solution that helps manage, protect, and discover data across your organization
@GARRE AKHIL Hey Garre Akhil, it sounds like you’re bumping into the built-in seeding requirements for Purview trainable classifiers. Here’s the quick gist and some tips to get past the “Insufficient samples” error:
- Minimum seed-data requirements • Positive samples: you need at least 50 unique docs (up to 500). • Negative samples: you need at least 150 (up to 1,500). Microsoft recommends a 1:3 ratio (for every positive, three negatives) for best results.
- Unique vs. template docs • You can use the same offer-letter template with different employee data, but make sure each file has enough unique content (e.g. >200 words, different names, dates, terms). • Avoid uploading 10 identical docs—Purview looks for distinct examples to build its model.
- Formatting consistency • Having the same header/footer or layout is OK as long as the body text varies meaningfully. • Very short or boilerplate-only files may be skipped or deemed “insufficient.”
- SharePoint folder setup & indexing • Put positives in one SharePoint folder and negatives in another (no Teams folders). • Wait ~1 hour after creating those folders so SharePoint can index them before you point your classifier at them.
- Processing time • After you submit your seed locations, Purview takes up to 24 hrs to ingest and test the samples. • During that time the status shows “In progress.” Only once it’s finished will you be allowed to publish.
- Double-check file support • Docs >20 MB only get metadata pulled, so keep individual files under that. • Purview supports Office formats (DOCX, PDF, etc.) for trainable classifiers.
Once you’ve hit the 50/150 threshold with sufficiently distinct content and let the system process your folders, the “Insufficient samples” error should go away. After training completes, test your classifier against known positives/negatives, give feedback on any mis-classifications, then publish it to your DLP, retention or auto-labeling policies.
Reference docs
• Get started with trainable classifiers (seed counts, workflow):
https://learn.microsoft.com/purview/trainable-classifiers-get-started-with
• Learn about trainable classifiers (positive vs. negative sets):
https://learn.microsoft.com/purview/trainable-classifiers-learn-about#custom-classifiers
• Classification best practices (supported file types, size limits):
Hope this helps you get past “Insufficient samples” and into a reliably trained Offer Letter model! Let me know if you hit any other snags.
Note: This content was drafted with the help of an AI system. Please verify the information before relying on it for decision-making.