Near duplicate detection in eDiscovery (Premium)

2025-01-08

Important

This article applies only to the classic eDiscovery (Premium) experience. The classic eDiscovery (Premium) experience will be retired in August 2025 and won't be available as an experience option in the Microsoft Purview portal after retirement.

We recommend that you start planning for this transition early and start using the new eDiscovery experience in the Microsoft Purview portal. To learn more about using the most current eDiscovery capabilities and features, see Learn about eDiscovery.

Consider a set of documents to be reviewed in which a subset is based on the same template and has mostly the same boilerplate language, with a few differences here and there. If a reviewer could identify this subset, review one of them thoroughly, and review the differences for the rest, they would not have missed any unique information while taking only a fraction of time that would have taken them to read all documents cover to cover. Near duplicate detection groups textually similar documents together to help you make your review process more efficient.

Tip

If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview trials hub. Learn details about signing up and trial terms.

How does it work?

When near duplicate detection is run, the system parses every document with text. Then, it compares every document against each other to determine whether their similarity is greater than the set threshold. If it is, the documents are grouped together. Once all documents have been compared and grouped, a document from each group is marked as the "pivot"; in reviewing your documents, you can review a pivot first and review the other documents in the same near duplicate set, focusing on the difference between the pivot and the document that is in review.

Share via

Near duplicate detection in eDiscovery (Premium)

How does it work?

Feedback

Additional resources