Share via

Any Azure AI service can help find possible duplicated customer data?

Henry Zhang 206 Reputation points
2024-09-04T20:20:05.4166667+00:00

Hello Everyone,

We have a customer table that includes information such as first name, last name, email, date of birth, address, and phone number. We're exploring whether there's an Azure AI service that can help us identify potential duplicate records. For example:

  • Record A: Henry Zhang, same phone
  • Record B: H Zhang, same phone

These might be duplicate entries.
If possible We would like to see a confidence score indicating the likelihood that they are duplicates.

Azure Language in Foundry Tools
Azure Language in Foundry Tools

An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform


Answer accepted by question author

  1. Grmacjon-MSFT 19,511 Reputation points Moderator
    2024-10-15T20:32:05.4933333+00:00

    Hi @Henry Zhang yes, this is possible with Azure AI Search

    Create an Azure AI Search Index:

    • Set up a new Azure AI Search service within your Azure subscription.
      • Create a new index and define the fields that will store your customer data (e.g., firstName, lastName, phoneNumber, address, dateOfBirth).
        • Ensure that the fields are indexed appropriately for efficient search.
        Index Your Customer Data:
        • Use the Azure AI Search SDK or REST API to index your customer data into the newly created index. Each customer record should be represented as a document with the corresponding field values.

    Create a Similarity Search Query:

    • Construct a similarity search query using the Azure AI Search query language. This query will specify the fields to compare and the similarity metric to use. For example, you could use the cosine similarity metric to compare text fields like names and addresses.
    {
        "search": {
            "query": "Henry Zhang",
            "searchType": "similarity",
            "similarityFields": [
                "firstName",
                "lastName",
                "phoneNumber",
                "address"
            ],
            "similarityMetrics": [
                {
                    "fields": ["firstName", "lastName"],
                    "metric": "cosine"
                },
                {
                    "fields": ["phoneNumber"],
                    "metric": "exact"
                }
            ]
        }
    }
    

    Evaluate Results:

    • Execute the similarity search query and analyze the results. Azure AI Search will return documents that are similar to the query based on the specified similarity metrics.
      • You can set a threshold for similarity scores to determine which results are potential duplicates. For instance, if the cosine similarity score is above 0.8, it might indicate a high likelihood of duplication.

    To get the best result make sure that your data is clean and consistent before indexing. Inconsistencies in spelling or formatting can affect similarity scores.

    Best,

    Grace

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Moazzem Hossain 70 Reputation points
    2024-09-04T20:35:09.8133333+00:00

    Yes, Azure offers several AI services that can help you identify possible duplicate customer data. such as: Azure Cognitive Services: Text AnalyticsText Analytics for Entity Recognition: You can use this service to identify and extract entities like names, addresses, and phone numbers from customer data. By comparing these entities across different records, you can detect potential duplicates


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.