Share via

Azure Search Indexer Change Detection Issue (Blob storage source) - Reprocessing Files

Quentin Vondel 0 Reputation points
2026-01-02T12:54:06.77+00:00

Recently, when I upload a new PDF file to my blob container and manually run the indexer the amount of processed documents is inconsistent. When I upload 1 new document and run the indexer it will process the newly uploaded document and the previous one. When I upload 3 new documents it will process 3 new documents and N previous ones (1,2 or 3) even though no changes were made to the previous files (no change in title or metadata or upload date/times).

  • I have updated thousands of documents before (up to 1 month ago) and never encountered this issue.
  • I tested building a new basic indexer (via import data / without skills) in new search deployment (same tenant) and also replicated in a different tenant and issue persists.
  • Originally I had an azure function connected as a skillset. I identified from the logs that the indexer was sending 2 individual requests to the function instead of 1.

Patterns:

  • The reprocessing or previous documents is always by order of upload
  • When setting up a new indexer the issue always appears starting with the 3rd document
  • When uploading N files the number of reprocessed files between 1-N
  • Sometime waiting long enough to manually run the indexer solves the issue for 1 run

Configuration

  • Pricing tier = Basic
  • Indexer data source = Blob
  • index key = metadata_storage_path (Base64 encoded)
  • Data source = Azure blob storage
    • default config OR
    • metadata_storage_last_modified = HighWaterMarkChangeDetectionPolicy

Can someone help me understand why the blob storage change detection is acting this way?

Did someone encounter the same issue and found a solution ?

How to properly log the indexer activity with (processed file names, process time, blob change detection trigger, ...)?

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.


2 answers

Sort by: Most helpful
  1. Golla Venkata Pavani 4,090 Reputation points Microsoft External Staff Moderator
    2026-01-02T13:44:18.1266667+00:00

    Hii @Quentin Vondel,

    Thank you for contacting us regarding the issue with your Azure Cognitive Search indexer reprocessing previously indexed blob files when new documents are uploaded. This is expected behavior in Azure Cognitive Search when using Blob Storage change detection.

    Your indexer is reprocessing previous PDFs because Azure uses LastModified timestamps to detect changes. When files are uploaded too quickly, their timestamps collide, and the high‑water mark rewinds, causing N recent files to be reprocessed. This is expected behavior.
    Why the Issue Starts With the 3rd Document?

    This behavior is consistent with Microsoft’s description that:

    • The indexer evaluates blobs in lexicographic order.
    • Closely timed uploads may share similar timestamps.
    • When high-water mark lands between blob timestamps, blobs before or after the boundary get re-indexed. The issue might be on the Azure AI Search side the HighWaterMarkChangeDetectionPolicy which is modifying blob timestamps during rapid uploads, causing reprocessing of old files.

    Please follow the below recommended steps

    1. Explicitly define change detection using metadata_storage_last_modified
    2. Upload files with more time spacing
    3. Use scheduled runs instead of manual runs
    4. Recreate indexer if initial configuration lacked change/delete policies.
    5. Use Azure Monitor + skillset logs for precise tracking.

    Azure AI Search provides built-in logging for processed files, times, and triggers, but it's not exhaustive by default. Here's how to set it up properly:
    Execution History (Portal):
    Go to Search service > Indexers > your indexer > Execution details. View run time, duration, documents processed, successes/failures, and base64-encoded document keys (decode to get file names). Change detection is inferred when a blob’s LastModified is newer than the indexer’s high-water mark.

    Enable Diagnostics: In Search service > Monitoring > Diagnostic settings, add a setting. Send logs to Log Analytics, Storage, or Event Hub and enable ExecutionAndOperations and AllLogs.
    Query Logs (Log Analytics): Use KQL to see indexer runs, processed documents, and timestamps. DocumentKey is base64-encoded and can be decoded to blob paths.

    AzureDiagnostics
    where ResourceType == "SEARCHINDEXER"
    where OperationName startswith "IndexerExecution"
    

    REST API: Call GET /indexers/{indexer-name}/status to see last run details and the high-water mark.

    Reference:
    https://learn.microsoft.com/en-us/azure/search/search-how-to-create-indexers?tabs=portal
    https://learn.microsoft.com/en-us/azure/search/search-how-to-index-azure-blob-changed-deleted?tabs=portal
    https://learn.microsoft.com/en-us/azure/search/search-indexer-troubleshooting
    https://learn.microsoft.com/en-us/azure/search/search-how-to-index-azure-blob-changed-deleted?tabs=portal
    https://learn.microsoft.com/en-us/azure/search/enrichment-cache-how-to-manage
    Kindly let us know if the above comment helps or you need further assistance on this issue.
    Please "upvote" if the information helped you. This will help us and others in the community as well.

    1 person found this answer helpful.
    0 comments No comments

  2. Q&A Assist
    2026-01-02T12:55:20.6933333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The issue you are experiencing with the Azure Search Indexer processing documents inconsistently when using blob storage as a data source could be related to how change detection is implemented for Azure Blob Storage. Here are some points to consider:

    1. Change Detection: Azure Blob Storage has built-in change detection that relies on the LastModified property of blobs. If the indexer is processing documents multiple times, it could be due to how the indexer is interpreting the timestamps of the blobs. If the timestamps are not updated or if there are delays in the blob storage reflecting the changes, the indexer may reprocess documents that it believes have changed.
    2. High Water Mark: The indexer maintains an internal high water mark to track the last processed document. If the indexer runs too frequently or if there are issues with the timing of blob updates, it may lead to documents being processed multiple times. This is especially true if the indexer is invoked manually shortly after uploads.
    3. Manual Invocation: As you noted, manually running the indexer can sometimes yield different results. This could be due to the timing of the indexer execution relative to when the blobs are updated in the storage. It is generally recommended to run indexers on a schedule to avoid such inconsistencies.
    4. Logging Indexer Activity: To log the indexer activity effectively, you can enable detailed logging in Azure Search. This can include logging processed file names, processing times, and the blob change detection triggers. You may need to implement custom logging in your application to capture this information if it is not available by default.
    5. Testing with New Indexers: Since you have tested with new indexers and the issue persists, it may indicate a broader issue with the change detection mechanism or how the indexer interacts with the blob storage.

    If the problem continues, consider reaching out to Azure support for further assistance, as they can provide more specific insights based on your configuration and usage patterns.


    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.