Document not deleted when Blob is removed using Indexers Automatically generated from Azure OpenAI Data Studio

Fulwani, Yash 5 Reputation points
2024-07-16T15:43:17.9266667+00:00

I am testing an Azure OpenAI on your data solution. I have set up an Azure Open AI resource and walked through the 'Add your data' workflow with Azure Blob Storage and the backing data source, this generates two indexers in the associated Azure Search resource- one indexer to chunk the data, and one indexer to index the chunks.

Periodically we upload, change, or delete blob in the source container and re-run the indexers. We have observed that when a blob is removed - the associated documents no longer appear in the first index (populated by the indexer that chunks the documents), but does still appear in the final index (populated by the indexer that indexes the chunked documents)- leaving them as orphaned documents.

I am wondering if anyone else is having this issue with orphaned documents when implementing Azure Open AI On Your Data with Azure Blob Storage using the Azure Open AI Data Studio? I am thinking that perhaps: (a) the first indexer is still chunking removed documents, (b) the second indexer is failing to detect or remove deleted documents from where the chunking indexer stores them.

I have confirm that our blob storage containers and the search data sources appropriately meet the requirements outline here that should allow indexers to detect and remove deleted documents.

The default deletion policy that is created in datasource is
"dataDeletionDetectionPolicy": { "@odata.type": "#Microsoft.Azure.Search.NativeBlobSoftDeleteDeletionDetectionPolicy" }

Should I change it?

Thank You

Yash

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
865 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,639 questions
{count} votes