Share via

Azure AI search indexer failing with large number of blob files (8000) when using skillset chunking

Vishakha Bansal 0 Reputation points
2026-03-05T13:42:53.1333333+00:00

Index Details:

  1. I am using Azure AI search to index around 8000 files from azure blob storage.
  2. Each file had large text column (250k-300k characters).
  3. I am chunking it using a skillset with maximumPageLength of 3000 characters and generating embedding using text-embedding-3-small.
  4. I am already setting "Max failed items" and "Max failed items per batch" as -1 in my indexer.

Issue: The indexer runs for about 2 hours and process 300-350 files per run. After a few runs it starts giving server related error for 400 files and partial success, and eventually stops picking up remaining files. However, if I upload only 500 files in blob storage, indexing works fine.

Is there any recommended best practice for indexing and chunking large datasets with Azure AI Search?

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.


1 answer

Sort by: Most helpful
  1. Shree Hima Bindu Maganti 7,190 Reputation points Microsoft External Staff Moderator
    2026-03-06T18:53:57.63+00:00

    Hi @Vishakha Bansal
    When indexing a very large number of files from Azure Blob Storage using Azure AI Search with a skillset for chunking and embedding generation, it is common for indexer runs to partially process documents or fail after long execution times. This typically happens because AI enrichment pipelines are resource-intensive and can encounter service throttling, long processing durations, or temporary authentication issues with the storage account during extended runs. For large datasets such as thousands of files, it is recommended to run the indexer on a schedule rather than relying on a single long execution. Scheduled runs allow the indexer to continue processing remaining documents and automatically retry transient failures. It is also important to ensure the search service has sufficient capacity (replicas or partitions) to handle enrichment workloads, and to monitor the indexer execution history so that failed items can be retried. These approaches help ensure that all documents are eventually indexed and reduce intermittent authentication or partial-run issues during large-scale indexing.
    https://learn.microsoft.com/en-us/azure/search/search-indexer-troubleshooting
    https://learn.microsoft.com/en-us/azure/search/cognitive-search-common-errors-warnings
    Let me know if you have any further assistances needed.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.