Delete documents from Azure AI Search Index

Hemachandra Siddani 0 Reputation points
2024-08-30T08:44:05.6433333+00:00

Hi , We used the "import and vectorize" wizard in the Azure Ai Search service for importing data from azure blob storage. This created an Index and Indexer. The Key field in the index shows as Chunk Id. The search works as expected. The problem we are experiencing is when a document is deleted from the Azure Blob Storage, we would like the Search Service not to fetch data from the deleted document. We would like this to be removed the Vector Index.

We tried using the REST API for the index ( @search.action =delete ) but it expected ID field which the "Import nd vectorize data" wizard does not create as part of index schema. Any suggestions on how to go about this issue ? Any help is highly appreciated.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
955 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 18,271 Reputation points
    2024-09-12T13:54:10.0266667+00:00

    Hi Hemachandra Siddani,

    Thanks for reaching out to Microsoft Q&A.

    To address the issue of deleting documents from the azure ai search index when the associated documents are deleted from azure blob storage, you can explore few options:

    1. Configure the indexer for soft delete detection:

    Azure cognitive search indexers support detecting and removing deleted documents by configuring the indexer with a "soft delete" column. Since you are using blob storage, you can modify your indexer configuration to use a soft delete field that detects when a blob has been deleted. This field should indicate deletion (ex: a boolean field or null). You can configure the indexer to automatically remove those items from the index when the corresponding blob is deleted or flagged as inactive.

    1. Use a custom indexer with ID mapping:

    If the wizard generated index does not have an id field, you can modify the index schema to include a unique identifier. This way, you can map the docu chunks to a specific id (ex: blob file name or any unique attribute from the source). Once you have an id field, you can use the rest api with '@search.action=delete' to remove documents from the index by referencing the document id.

    1. Manually track deletions in blob storage:

    As a more manual approach, you could implement a mechanism that tracks deletions in the blob storage (ex: using event grid triggers). When a file is deleted in blob storage, trigger a process that invokes the azure search rest api to remove the corresponding index entries.

    1. Rebuild the indexer regularly:

    if the number of deletions is low or can be handled periodically, you might consider rebuilding the index from scratch periodically. This will sync the index with the current state of blob storage, effectively removing entries for deleted documents.

    I suggest starting with the first option, configuring the indexer for soft delete detection, as it automates the process. However, if that does not fit your case, adding a unique id field to your schema for api-based deletion would be a more precise solution.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.