Duplicate documents inside the index Azure Cognitive Search after Indexer re-run on schedul. Can we overwrite data inside Index?

Redar Ismail Chicho 1 Reputation point
2022-11-07T22:16:03.367+00:00

I created an Azure Cognitive Search using a Cosmos DB SQL API database. I created the index and the indexer. My Cosmos DB database is refreshed daily. Hence, I created the indexer in my azure search to be refreshed on schedule. The problem I have now is my index. The index is duplicated and retrieves duplicate items. When I created my index I had 23,000 documents indexed and currently, I have 46,000 indexed items. My search results are duplicates. This is a production app at work, so your help is much appreciated.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
727 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ajkuma 22,516 Reputation points Microsoft Employee
    2022-11-10T20:20:42.717+00:00

    @Redar Ismail Chicho , Thanks for the follow-up and additional info.

    You may retrieve the data with Search Documents (Azure Cognitive Search REST API) | Microsoft Learn and get the doc key for the duplicate ones and proceed to Add, Update or Delete Documents (Azure Cognitive Search REST API) | Microsoft Learn as needed using those REST APIs.

    I suggest that you review your index schema to make sure you're using an index key that is representative of your unique doc id (such as the database doc ID). If you are getting duplicate items in your index, is because you are not using a matching value of a unique field in the source but perhaps a random id generated by the system and not one defined by you -see (Index overview - Azure Cognitive Search | Microsoft Learn).

    0 comments No comments