Duplicate Detection in CRM 4.0 – Good, Bad & Ugly

CRM 4.0 provides this powerful feature that enables users and administrators to identify and manage duplicate records in the system. Duplicate detection is enabled by default with CRM 4.0 installation. Duplicate detection is capable of detecting duplicates during the creation / update of records, when the Outlook client comes Online and when data is imported. These settings are organization specific.

Like any other feature, use of Duplicate Detection with CRM 4.0 requires some basic understanding of the feature otherwise it may have an adverse effect on the overall performance of the system. I have observed CRM 4.0 environments behaving unexpectedly due to inappropriate usage of duplicate detection, here is some basic information that can be useful for both users and administrators.

Duplicate detection rules

Duplicate detection rules are associated to entities and contain conditions that define a duplicate record. It could be just one condition or multiple conditions. For example, a duplicate detection rule related to Contact entity can have a condition like ‘contact’ with the same First name or conditions like same first name + same e-mail address. Duplicate detection rule is active when published and publishing of the duplicate detection rule is processed by Microsoft CRM Asynchronous services as system job of type ‘Duplicate detection rule publish’.

Duplicate detected window with list of duplicate records, which users get when they save record matching the duplicate condition will start appearing after the rule is published and the related system tables are populated.

Duplicate detection jobs

Duplicate detection jobs are associated with an entity and rely on all the rules associated with the entity to identify duplicates from records in given criteria for the associated entity. The record criteria, on which duplicate detection job operates, could be limited to My Active Contact or be as broad as Active Contacts or be a customized criteria like an advanced find. These jobs are processed by the Async Service as system job of type ‘Bulk duplicate detection job’.

Duplicate detection job identifies duplicates from records in given criteria based on the conditions in all the rules associated with the entity. . It considers each record that has duplicates as a base record and the matching records as potential duplicates. For each base record there would be a set of duplicate records captured. E.g. let’s say you have 4 Contacts with the same e-mail address, which is the condition of your rule - now when a duplicate detection job is run on Contacts, the total number of records captured would be 12, which is due to the fact that each record has 3 duplicate records (record with same e-mail ID), causing a unique combination of base record and duplicate record.

This information is captured to enable the administrator to identity unwanted records as per their business requirements and manage the duplicates identified. To visualize this, open a succeeded duplicate detection job and click on view duplicates - this will show you the base records in the top list and duplicates in the bottom list. The bottom list provides options to delete, activate, deactivate, edit, merge records and run a workflow.

Records captured by the duplicate detection job are stored in the duplicaterecordbase table in _MSCRM database. When a duplicate detection job is deleted through the UI, all relevant records are deleted from the duplicaterecordbase table. Consider a situation when a duplicate detection job is reoccurring on all records and has never been worked upon to manage duplicates and/or job itself is not deleted - This can result in an unfavorable situation where the duplicaterecordbase table keeps a record of all the duplicates and keeps growing larger thereby impacting the overall performance of the system.

Best practices

For administrative purpose, run duplicate detection jobs on a narrowed down criteria that gives you some idea on the number of duplicates that exist. Use duplicate detection job results for cleansing the duplicate records which will keep the duplicaterecordbase table in shape. One can remove the succeeded duplicate detection jobs and work upon clearing related records from the duplicaterecordbase table. Removing the duplicate detection job is desirable when a job is run on an expanded criteria / large data set e.g. active contacts.

I have observed the duplicaterecordbase table grow up to 79 million rows where actual entity record count was as less as 183,770. This environment had multiple issues related to performance and removing reoccurring duplicate detection jobs from Settings-Data management (it was a lengthy job that was executed in batches overnight) resulted in a welcome change in the performance and improved general response times of the application.

Cheers,
Bhavesh