Avoid re-indexing all documents after migration of an Azure Search Index

Question

Avoid re-indexing all documents after migration of an Azure Search Index

Giacomo Bianco 20

I used the Backup and Restore project to migrate my Azure Search resource to a different service tier (downgrade from S2 to S1).

The new index in the new Azure Search resource is identical to the old one, and this is fine.

The problem raise when I re-create the indexer because it starts to index again ALL THE DOCUMENTS from the datasource, no matter if they are already present in the index. It executes again all the skillset.

How can I avoid it?

If it is necessary to run the indexer to allow it to save the watermarks of indexed files, is there a way to make it run witn NO ACTION on the index and without executing the skillset? Otherwise it will take days to index all documents in the datasource again :(

Thank you all

Accepted answer

0 additional answers

Your answer

Answer 1

Hi @Giacomo Bianco we are sorry to hear you're facing this issue.

You're correct, re-creating the indexer in Azure Search after a migration using Backup and Restore can lead to unnecessary re-indexing of all documents. This happens because the indexer keeps track of which documents have been indexed through an internal high-water mark. When you recreate the indexer, this high-water mark is reset, causing all documents to be re-indexed.

Unfortunately, there’s no built-in way to make the indexer run with no action on the index and without executing the skillset. The indexer doesn’t have a mechanism to know which documents are already present in the index

However, there are a few strategies you can consider to avoid this issue:

If your data source supports it, you can configure your indexer to do incremental indexing. For example, if you’re using Azure Blob Storage, the indexer uses the metadata_storage_last_modified field to identify changed files and avoid re-indexing everything after the initial indexing
If your documents are small, you can increase the batch size to speed up the indexing process
If your data source is partitioned, you can create multiple indexers to index different partitions in parallel.
You can schedule the indexer to run during off-peak hours to minimize the impact on your application’s performance.

Best,

Grace

Share via

Avoid re-indexing all documents after migration of an Azure Search Index

0 additional answers

Your answer