A Microsoft Purview trainable classifier is a tool you can train to recognize various types of content by giving it samples to look at. Once trained, you can use it to identify item for application of Office sensitivity labels, Communications compliance policies, and retention label policies.
Two steps are required for implementing a custom trainable classifier:
Provide two sets of sample data (selected by humans).
A set that contains only items that belong in the category.
A set that contains only items that do not belong in the category.
Test the classifier's ability to detect matches.
This article explains how to create and test a custom classifier.
If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview trials hub. Learn details about signing up and trial terms.
Prerequisites
Licensing requirements
Classifiers are a feature in Microsoft 365 E3 and E5 Compliance. You must have one of these subscriptions to make use of them.
Permissions
To use classifiers in the following scenarios, you need the following permissions:
Scenario
Required Role Permissions
Retention label policy
Record Management Retention Management
Sensitivity label policy
Security Administrator Compliance Administrator Compliance Data Administrator
To ensure that your trainable classifier can independently and accurately identify that an item belongs to a particular category of content, you must present it with many samples of the type of content that is in the category. This feeding of samples to the trainable classifier is known as seeding. A human must be the one to select seed content, and that content must include two sets of data: one that contains only items that strongly represent the content the classifier is designed to detect (positive samples) and a second set of items that clearly don't belong (negative samples).
At least 50 positive samples (up to 500) and at least 150 negative samples (up to 1500) are required to train a classifier. The more samples you provide, the more accurate the predictions the classifier makes will be. The trainable classifier processes up to the 2000 most recently created samples (by file created date/time stamp).
Suġġeriment
For best results, have at least 200 items in your test sample set that includes at least 50 positive examples and at least 150 negative examples.
How to create a trainable classifier
Select the appropriate tab for the portal you're using. Depending on your Microsoft 365 plan, the Microsoft Purview compliance portal is retired or will be retired soon.
In preview: The following process automates the testing of trainable classifiers and shortens the creation workflow from 12 days to two days. (In some cases, the process can take only a few hours.)
Collect a second set of seed content (from 150 - 1500 items) that represents data that don't belong in the category.
Place the positive and negative seed content in separate SharePoint folders. Each folder must be dedicated to holding only the seed content. Make note of the site, library, and folder URL for each set.
Suġġeriment
If you create a new SharePoint site and folder for your seed data, allow at least an hour for that location to be indexed before creating the trainable classifier that will use that seed data.
Add the source of your positive examples: select the SharePoint site, library, and folder URL for the seed content that should be detected by the classifier and then choose Next.
Add the source of your negative examples: select the SharePoint site, library, and folder URL for the seed content that should be ignored by the classifier and then choose Next.
Review the settings and choose Create trainable classifier.
Within 24 hours or less, the trainable classifier processes the seed data and builds a prediction model. The classifier status is In progress while it processes the seed data. When the classifier is finished processing the seed data, the status changes to Training is complete and items have been tested.
Once training is complete and items have been (automatically) tested, publish the classifier by choosing Publish for use.
Once the trainable classifier processes enough positive and negative samples to build a prediction model, you need to test the predictions it makes. In testing the classifier, you verify whether its predictions are correct. Once all of the data is processed, go through the results manually and verify whether each prediction is correct, incorrect, or you aren't sure. Microsoft uses this feedback in aggregate to improve the prediction model.
En este módulo, se presenta la clasificación de datos en Microsoft 365, incluido cómo crear y entrenar clasificadores, ver datos confidenciales mediante el Explorador de contenido y el Explorador de actividades e implementar la creación de huella digital de documento.
Demuestre los aspectos básicos de la seguridad de los datos, la administración del ciclo de vida, la seguridad de la información y el cumplimiento para proteger una implementación de Microsoft 365.
Los clasificadores entrenables pueden reconocer varios tipos de contenido para la aplicación de etiquetas o directivas proporcionándole ejemplos positivos y negativos a los que examinar.
Este documento contiene todos los clasificadores entrenables, sus definiciones y todos los tipos de archivo que buscan para encontrar información confidencial.
La huella digital de documentos facilita la protección de la información mediante la identificación de formularios estándar que usa su organización. En este artículo se describen los conceptos subyacentes a la huella digital de documentos y cómo crear uno mediante PowerShell.
En este artículo se proporciona información general sobre los tipos de información confidencial y cómo detectan información confidencial como el seguro social, la tarjeta de crédito o los números de cuenta bancaria para identificar elementos confidenciales.
Obtenga información acerca de cómo crear un tipo de información confidencial personalizado que le permita usar reglas que cumplan con las necesidades de su organización.