How does using multiple Indexers work?

Question

How does using multiple Indexers work?

S-A 25

I have created a Search Service on Azure and I want to speed the indexing process, so, I have created multiple indexers that index data from the same data source into the same index. Now this made the process faster, however, I don't understand how it works. Let's say the first indexer ran and indexed one document, does the second indexer re-indexes that same document? and if so which indexed version is stored in the index?

Another questions is, if the first indexer ran and indexed part of a document (not all of it), does the second indexer take the next part of that document? or does it start over?

I basically want to understand how using multiple indexers work in the background.

This website's assistant generated the following answer. Can someone verify its accuracy?

"When using multiple indexers in Azure AI Search, each indexer can pull from the same data source and write to the same search index. Each indexer is distinct and can run at the same time, populating the search index more quickly than if they ran sequentially. If the first indexer ran and indexed one document, the second indexer will not re-index that same document. The second indexer will only index new or updated documents that were not indexed by the first indexer. If the first indexer ran and indexed part of a document, the second indexer will start over and index the entire document."

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Accepted answer

0 additional answers

Your answer

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Answer 1

@S-A You're correct that using multiple indexers allows indexing the same data source in parallel, accelerating the process.

These Azure Cognitive Search docs provide some additional details on how this works:

This doc provides strategies for indexing large data sets in Azure AI Search. It mentions that if two indexers retrieve the same item, the indexer that completes last will overwrite the existing indexed document
Indexers track state about what they've already indexed using a high water mark. This prevents re-indexing the full source each time
If one indexer crashes part way through a large item, the next indexer will start over on that item rather than continuing where it left off.

So in summary:

Multiple indexers pull data in parallel to accelerate indexing
They track state to avoid re-indexing existing docs
Last write wins if indexing same doc
Items are re-indexed fully if crashed midway

Using multiple indexers is a good approach to scale out indexing throughput. The tradeoff is index consistency if overlapping indexing occurs.

Hope that helps. Let us know if you have additional questions about using multiple Indexers.

-Grace

Share via

How does using multiple Indexers work?

0 additional answers

Your answer