How to handle duplicate documents in Azure AI Search Index

Siddhant Kumta 120 Reputation points
2025-12-17T16:50:16.4666667+00:00

Hello,

My current pipeline consists of uploading documents to an azure storage container, sending it to document intelligence for extraction, and then indexing those documents into an AI Search Index, the problem I want to learn about is how you would handle duplicate documents and be able to check if and stop a duplicate document from being uploaded. For instance, if I upload 100 documents first, and then at a later point want to upload 100 more, how do I create a check that makes sure none of the second batch has a duplicate document being indexed. I have seen a setup, where it uses the cache to do a check, but this does not seem efficient.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
0 comments No comments
{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.