Create a predictive coding model (preview)

The first step in using the machine learning capabilities of predictive coding in eDiscovery (Premium) is to create a predictive coding model. After you create a model, you can train it identify the relevant and non-relevant content in a review set.

To review the predictive coding workflow, see Learn about predictive coding in eDiscovery (Premium)


If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.

Before you create a model

  • There must be a minimum of 2,000 items in a review set to create a predictive coding model.
  • Be sure to commit all collections to the review set before you create a model. Items added to a review set after the model is created will not be processed and assigned a prediction score that generated by the model.
  • Any item in the review set that doesn't contain text would will not be processed by the model or assigned a prediction score. Items with text will be included in the control set or a training set.

Create a model

  1. In the Microsoft Purview compliance portal, open an eDiscovery (Premium) case and then select the Review sets tab.

  2. Open a review set and then select Analytics > Manage predictive coding (preview).

    Select the Analyze dropdown menu in review set to go to the Predictive coding page.

  3. On the Predictive coding models (preview) page, select New model.

  4. On the flyout page, type a name for the model and an optional description.

  5. Optionally, you can configure advanced settings (by selecting Advanced options on the flyout page) related to the confidence level and margin of error. These settings affect the number of items included in the control set. The control set is used during the training process to evaluate the prediction scores that the model assigns to items with the labeling that you perform during the training rounds. If your organization has guidelines about confidence level and margin of error for document review, specify them in the appropriate boxes. Otherwise, use the default settings.

  6. Select Save to create the model.

    It will take a couple minutes for the system to prepare your model. After it's ready, you can perform the first round of training.

What happens after you create a model

After you create a model, the following things occur in the background during the creation and preparation of the model:

  • The system calculates the number of items for the control set. This size is based on the number of items in the review set and the settings for the confidence level and the margin of error. Items for the control set are randomly selected and designated as control set items. The system includes 10 items from the control set in the first round of training.
  • The system randomly selects 40 items from the review set to be included in the training set for the first round of training. Therefore, the first round of training includes 50 items for labeling: 40 items from the training set and 10 items from the control set.

Next steps

After you create a model for a review set, the next step is performing training rounds to "teach" the model to identify content that is relevant to your investigation. For more information, see Train a predictive coding model.