Any Azure AI service can help find possible duplicated customer data?

Henry Zhang 206 Reputation points
2024-09-04T20:20:05.4166667+00:00

Hello Everyone,

We have a customer table that includes information such as first name, last name, email, date of birth, address, and phone number. We're exploring whether there's an Azure AI service that can help us identify potential duplicate records. For example:

  • Record A: Henry Zhang, same phone
  • Record B: H Zhang, same phone

These might be duplicate entries.
If possible We would like to see a confidence score indicating the likelihood that they are duplicates.

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
421 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,895 questions
{count} votes

Accepted answer
  1. Grmacjon-MSFT 18,451 Reputation points
    2024-10-15T20:32:05.4933333+00:00

    Hi @Henry Zhang yes, this is possible with Azure AI Search

    Create an Azure AI Search Index:

    • Set up a new Azure AI Search service within your Azure subscription.
      • Create a new index and define the fields that will store your customer data (e.g., firstName, lastName, phoneNumber, address, dateOfBirth).
        • Ensure that the fields are indexed appropriately for efficient search.
        Index Your Customer Data:
        • Use the Azure AI Search SDK or REST API to index your customer data into the newly created index. Each customer record should be represented as a document with the corresponding field values.

    Create a Similarity Search Query:

    • Construct a similarity search query using the Azure AI Search query language. This query will specify the fields to compare and the similarity metric to use. For example, you could use the cosine similarity metric to compare text fields like names and addresses.
    {
        "search": {
            "query": "Henry Zhang",
            "searchType": "similarity",
            "similarityFields": [
                "firstName",
                "lastName",
                "phoneNumber",
                "address"
            ],
            "similarityMetrics": [
                {
                    "fields": ["firstName", "lastName"],
                    "metric": "cosine"
                },
                {
                    "fields": ["phoneNumber"],
                    "metric": "exact"
                }
            ]
        }
    }
    

    Evaluate Results:

    • Execute the similarity search query and analyze the results. Azure AI Search will return documents that are similar to the query based on the specified similarity metrics.
      • You can set a threshold for similarity scores to determine which results are potential duplicates. For instance, if the cosine similarity score is above 0.8, it might indicate a high likelihood of duplication.

    To get the best result make sure that your data is clean and consistent before indexing. Inconsistencies in spelling or formatting can affect similarity scores.

    Best,

    Grace

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Moazzem Hossain 0 Reputation points
    2024-09-04T20:35:09.8133333+00:00

    Yes, Azure offers several AI services that can help you identify possible duplicate customer data. such as: Azure Cognitive Services: Text AnalyticsText Analytics for Entity Recognition: You can use this service to identify and extract entities like names, addresses, and phone numbers from customer data. By comparing these entities across different records, you can detect potential duplicates


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.