How to handle duplicate documents in Azure AI Search Index

Siddhant Kumta 120

Hello,

My current pipeline consists of uploading documents to an azure storage container, sending it to document intelligence for extraction, and then indexing those documents into an AI Search Index, the problem I want to learn about is how you would handle duplicate documents and be able to check if and stop a duplicate document from being uploaded. For instance, if I upload 100 documents first, and then at a later point want to upload 100 more, how do I create a check that makes sure none of the second batch has a duplicate document being indexed. I have seen a setup, where it uses the cache to do a check, but this does not seem efficient.

Share via

How to handle duplicate documents in Azure AI Search Index

Your answer