Quick start: Predictive coding in eDiscovery (Premium) (preview)

Important

Predictive coding has been retired as of March 31, 2024 and is not available in new eDiscovery cases. For existing cases with trained predictive coding models, you can continue to apply existing score filters to review sets. However, you can't create or train new models.

This article presents a quick start for using predictive coding in Microsoft Purview eDiscovery (Premium). The predictive coding module uses intelligent, machine learning capabilities to help you cull large volumes of case content that's not relevant to your investigation. This is accomplished by creating and training your own predictive coding models that help you prioritize the most relevant items for review.

Here's an a quick overview of the predictive coding process:

Quick start process for prediction coding.

To get started, you create a model, label as few as 50 items as relevant or not relevant. The system then uses this training to apply prediction scores to every item in the review set. This lets you filter items based on the prediction score, which allows you to review the most relevant (or non-relevant) items first. If you want to train models with higher accuracies and recall rates, you can continue labeling items in subsequent training rounds until the model stabilizes. Once the model is stabilized, you can apply the final prediction filter to prioritize items to review.

For a detailed overview of predictive coding, see Learn about predictive coding in eDiscovery (Premium).

Tip

If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.

Step 1: Create a new predictive coding model

The first step is to create a new predictive coding model in the review set

  1. In the Microsoft Purview compliance portal, open an eDiscovery (Premium) case and then select the Review sets tab.

  2. Open a review set and then select Analytics > Manage predictive coding (preview).

    Select the Analyze dropdown menu in review set to go to the Predictive coding page.

  3. On the Predictive coding models (preview) page, select New model.

  4. On the flyout page, type a name for the model and an optional description.

  5. Select Save to create the model.

    It will take a couple minutes for the system to prepare your model. After it's ready, you can perform the first round of training.

For more detailed instructions, see Create a predictive coding model.

Step 2: Perform the first training round

After you create the model, the next step is to complete the first training round by labeling items as relevant or not relevant.

  1. Open the review set and then select Analytics > Manage predictive coding (preview).

  2. On the Predictive coding models (preview) page, select the model that you want to train.

  3. On the Overview tab, under Round 1, select Start next training round.

    The Training tab is displayed and contains 50 items for you to label.

  4. Review each document and then select the Relevant or Not relevant button at the bottom of the reading pane to label it.

    Label each document as relevant or not relevant.

  5. After you've labeled all 50 items, select Finish.

    It will take a couple minutes for the system to "learn" from your labeling and update the model. When this process is complete, a status of Ready is displayed for the model on the Predictive coding models (preview) page.

For more detailed instructions, see Train a predictive coding model.

Step 3: Apply the prediction score filter to items in review set

After you perform at lease one training round, you can apply the prediction score filter to items in review set. This lets you review the items the model has predicted as relevant or not relevant.

  1. Open the review set.

    Select Filters to display the Filters flyout page.

    The pre-loaded default filters are displayed at the top of the review set page. You can leave these set to Any.

  2. Select Filters to display the Filters flyout page.

  3. Expand the Analytics & predictive coding section to display a set of filters.

    Prediction score filter in the Analytics & predictive coding section.

    The naming convention for prediction score filters is Prediction score (model name). For example, the prediction score filter name for a model named Model A is Prediction score (Model A).

  4. Select the prediction score filter that you want to use and then select Done.

  5. On the review set page, select the dropdown for the prediction score filter and type minimum and maximum values for the prediction score range. For example, the following screenshot shows a prediction score range between .5 and 1.0.

    Minimum and maximum values for the prediction score filter.

  6. Select outside the filter to automatically apply the filter to the review set.

A list of documents with a prediction score within the range you specified is displayed on the review set page.

For more detailed instructions, see Apply a prediction filter to a review set.

Step 4: Perform more training rounds

More than likely, you'll have to perform more training rounds to train the module to better predict relevant and non-relevant items in the review set. In general, you'll train the model enough times until it stabilizes enough to meet your requirements.

For more information, see Perform additional training rounds

Step 5: Apply the final prediction score filter to prioritize review

Repeat the instructions in Step 3 to apply the final prediction score to the review set to prioritize the review of relevant and non-relevant items after you complete all the training rounds and stabilize the model.