How does using multiple Indexers work?

S-A 25 Reputation points
2024-01-09T10:12:29.9733333+00:00

I have created a Search Service on Azure and I want to speed the indexing process, so, I have created multiple indexers that index data from the same data source into the same index. Now this made the process faster, however, I don't understand how it works. Let's say the first indexer ran and indexed one document, does the second indexer re-indexes that same document? and if so which indexed version is stored in the index?

Another questions is, if the first indexer ran and indexed part of a document (not all of it), does the second indexer take the next part of that document? or does it start over?

I basically want to understand how using multiple indexers work in the background.

This website's assistant generated the following answer. Can someone verify its accuracy?

"When using multiple indexers in Azure AI Search, each indexer can pull from the same data source and write to the same search index. Each indexer is distinct and can run at the same time, populating the search index more quickly than if they ran sequentially. If the first indexer ran and indexed one document, the second indexer will not re-index that same document. The second indexer will only index new or updated documents that were not indexed by the first indexer. If the first indexer ran and indexed part of a document, the second indexer will start over and index the entire document."

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
{count} votes

Accepted answer
  1. Grmacjon-MSFT 19,151 Reputation points Moderator
    2024-01-17T07:03:37.89+00:00

    @S-A You're correct that using multiple indexers allows indexing the same data source in parallel, accelerating the process.

    These Azure Cognitive Search docs provide some additional details on how this works:

    • This doc provides strategies for indexing large data sets in Azure AI Search. It mentions that if two indexers retrieve the same item, the indexer that completes last will overwrite the existing indexed document
    • Indexers track state about what they've already indexed using a high water mark. This prevents re-indexing the full source each time
    • If one indexer crashes part way through a large item, the next indexer will start over on that item rather than continuing where it left off.

    So in summary:

    • Multiple indexers pull data in parallel to accelerate indexing
    • They track state to avoid re-indexing existing docs
    • Last write wins if indexing same doc
    • Items are re-indexed fully if crashed midway

    Using multiple indexers is a good approach to scale out indexing throughput. The tradeoff is index consistency if overlapping indexing occurs.

    Hope that helps. Let us know if you have additional questions about using multiple Indexers.

    -Grace

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.