Azure Search Indexer Change Detection Issue (Blob storage source) - Reprocessing Files

Question

Azure Search Indexer Change Detection Issue (Blob storage source) - Reprocessing Files

Quentin Vondel 0

Recently, when I upload a new PDF file to my blob container and manually run the indexer the amount of processed documents is inconsistent. When I upload 1 new document and run the indexer it will process the newly uploaded document and the previous one. When I upload 3 new documents it will process 3 new documents and N previous ones (1,2 or 3) even though no changes were made to the previous files (no change in title or metadata or upload date/times).

I have updated thousands of documents before (up to 1 month ago) and never encountered this issue.
I tested building a new basic indexer (via import data / without skills) in new search deployment (same tenant) and also replicated in a different tenant and issue persists.
Originally I had an azure function connected as a skillset. I identified from the logs that the indexer was sending 2 individual requests to the function instead of 1.

Patterns:

The reprocessing or previous documents is always by order of upload
When setting up a new indexer the issue always appears starting with the 3rd document
When uploading N files the number of reprocessed files between 1-N
Sometime waiting long enough to manually run the indexer solves the issue for 1 run

Configuration

Pricing tier = Basic
Indexer data source = Blob
index key = metadata_storage_path (Base64 encoded)
Data source = Azure blob storage
- default config OR
- metadata_storage_last_modified = HighWaterMarkChangeDetectionPolicy

Can someone help me understand why the blob storage change detection is acting this way?

Did someone encounter the same issue and found a solution ?

How to properly log the indexer activity with (processed file names, process time, blob change detection trigger, ...)?

Golla Venkata Pavani 4,090 Reputation points Microsoft External Staff Moderator

2026-01-05T18:25:40.67+00:00

Hii @Quentin Vondel,

I am Just checking in to see if you had a chance to see the previous response posted by me. If you have any further questions do let us know.
Quentin Vondel 0 Reputation points

2026-01-07T14:44:32.5433333+00:00

Hi, thank you for the response.

I have been running the indexer for a few days in production and it seems to behave normally so it is probably related to the cadence of blob uploads and manual runs as you mentioned.
I'm still not sure why this never happened to me before.

I wish the monitoring of processed/indexed documents was made a little easier but I guess that is a different topic.

Regards,
Aditya N 2,795 Reputation points Microsoft External Staff Moderator

2026-01-12T07:09:59.49+00:00

Hello @Quentin Vondel

The issue might be on the Azure AI Search side the HighWaterMarkChangeDetectionPolicy which is modifying blob timestamps during rapid uploads, causing reprocessing of old files.

2 answers

Your answer

Golla Venkata Pavani 4,090 Reputation points Microsoft External Staff Moderator

2026-01-05T18:25:40.67+00:00

Hii @Quentin Vondel,

I am Just checking in to see if you had a chance to see the previous response posted by me. If you have any further questions do let us know.
Quentin Vondel 0 Reputation points

2026-01-07T14:44:32.5433333+00:00

Hi, thank you for the response.

I have been running the indexer for a few days in production and it seems to behave normally so it is probably related to the cadence of blob uploads and manual runs as you mentioned.
I'm still not sure why this never happened to me before.

I wish the monitoring of processed/indexed documents was made a little easier but I guess that is a different topic.

Regards,
Aditya N 2,795 Reputation points Microsoft External Staff Moderator

2026-01-12T07:09:59.49+00:00

Hello @Quentin Vondel

The issue might be on the Azure AI Search side the HighWaterMarkChangeDetectionPolicy which is modifying blob timestamps during rapid uploads, causing reprocessing of old files.

Answer 1

Hii @Quentin Vondel,

Thank you for contacting us regarding the issue with your Azure Cognitive Search indexer reprocessing previously indexed blob files when new documents are uploaded. This is expected behavior in Azure Cognitive Search when using Blob Storage change detection.

Your indexer is reprocessing previous PDFs because Azure uses LastModified timestamps to detect changes. When files are uploaded too quickly, their timestamps collide, and the high‑water mark rewinds, causing N recent files to be reprocessed. This is expected behavior.
Why the Issue Starts With the 3rd Document?

This behavior is consistent with Microsoft’s description that:

The indexer evaluates blobs in lexicographic order.
Closely timed uploads may share similar timestamps.
When high-water mark lands between blob timestamps, blobs before or after the boundary get re-indexed. The issue might be on the Azure AI Search side the HighWaterMarkChangeDetectionPolicy which is modifying blob timestamps during rapid uploads, causing reprocessing of old files.

Please follow the below recommended steps

Explicitly define change detection using metadata_storage_last_modified
Upload files with more time spacing
Use scheduled runs instead of manual runs
Recreate indexer if initial configuration lacked change/delete policies.
Use Azure Monitor + skillset logs for precise tracking.

Azure AI Search provides built-in logging for processed files, times, and triggers, but it's not exhaustive by default. Here's how to set it up properly:
Execution History (Portal):
Go to Search service > Indexers > your indexer > Execution details. View run time, duration, documents processed, successes/failures, and base64-encoded document keys (decode to get file names). Change detection is inferred when a blob’s LastModified is newer than the indexer’s high-water mark.

Enable Diagnostics: In Search service > Monitoring > Diagnostic settings, add a setting. Send logs to Log Analytics, Storage, or Event Hub and enable ExecutionAndOperations and AllLogs.
Query Logs (Log Analytics): Use KQL to see indexer runs, processed documents, and timestamps. DocumentKey is base64-encoded and can be decoded to blob paths.

AzureDiagnostics
where ResourceType == "SEARCHINDEXER"
where OperationName startswith "IndexerExecution"

REST API: Call GET /indexers/{indexer-name}/status to see last run details and the high-water mark.

Reference:
https://learn.microsoft.com/en-us/azure/search/search-how-to-create-indexers?tabs=portal
https://learn.microsoft.com/en-us/azure/search/search-how-to-index-azure-blob-changed-deleted?tabs=portal
https://learn.microsoft.com/en-us/azure/search/search-indexer-troubleshooting
https://learn.microsoft.com/en-us/azure/search/search-how-to-index-azure-blob-changed-deleted?tabs=portal
https://learn.microsoft.com/en-us/azure/search/enrichment-cache-how-to-manage
Kindly let us know if the above comment helps or you need further assistance on this issue.
Please "upvote" if the information helped you. This will help us and others in the community as well.

Answer 2

The issue you are experiencing with the Azure Search Indexer processing documents inconsistently when using blob storage as a data source could be related to how change detection is implemented for Azure Blob Storage. Here are some points to consider:

Change Detection: Azure Blob Storage has built-in change detection that relies on the LastModified property of blobs. If the indexer is processing documents multiple times, it could be due to how the indexer is interpreting the timestamps of the blobs. If the timestamps are not updated or if there are delays in the blob storage reflecting the changes, the indexer may reprocess documents that it believes have changed.
High Water Mark: The indexer maintains an internal high water mark to track the last processed document. If the indexer runs too frequently or if there are issues with the timing of blob updates, it may lead to documents being processed multiple times. This is especially true if the indexer is invoked manually shortly after uploads.
Manual Invocation: As you noted, manually running the indexer can sometimes yield different results. This could be due to the timing of the indexer execution relative to when the blobs are updated in the storage. It is generally recommended to run indexers on a schedule to avoid such inconsistencies.
Logging Indexer Activity: To log the indexer activity effectively, you can enable detailed logging in Azure Search. This can include logging processed file names, processing times, and the blob change detection triggers. You may need to implement custom logging in your application to capture this information if it is not available by default.
Testing with New Indexers: Since you have tested with new indexers and the issue persists, it may indicate a broader issue with the change detection mechanism or how the indexer interacts with the blob storage.

If the problem continues, consider reaching out to Azure support for further assistance, as they can provide more specific insights based on your configuration and usage patterns.

References:

Share via

Azure Search Indexer Change Detection Issue (Blob storage source) - Reprocessing Files

2 answers

Your answer