Avoid re-indexing all documents after migration of an Azure Search Index

Giacomo Bianco 20 Reputation points
2024-03-13T10:30:45.5333333+00:00

I used the Backup and Restore project to migrate my Azure Search resource to a different service tier (downgrade from S2 to S1).

The new index in the new Azure Search resource is identical to the old one, and this is fine.

The problem raise when I re-create the indexer because it starts to index again ALL THE DOCUMENTS from the datasource, no matter if they are already present in the index. It executes again all the skillset.

How can I avoid it?

If it is necessary to run the indexer to allow it to save the watermarks of indexed files, is there a way to make it run witn NO ACTION on the index and without executing the skillset? Otherwise it will take days to index all documents in the datasource again :(

Thank you all

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,346 questions
0 comments No comments
{count} votes

Accepted answer
  1. Grmacjon-MSFT 19,151 Reputation points Moderator
    2024-03-14T22:21:31.73+00:00

    Hi @Giacomo Bianco we are sorry to hear you're facing this issue.

    You're correct, re-creating the indexer in Azure Search after a migration using Backup and Restore can lead to unnecessary re-indexing of all documents. This happens because the indexer keeps track of which documents have been indexed through an internal high-water mark. When you recreate the indexer, this high-water mark is reset, causing all documents to be re-indexed.

    Unfortunately, there’s no built-in way to make the indexer run with no action on the index and without executing the skillset. The indexer doesn’t have a mechanism to know which documents are already present in the index

    However, there are a few strategies you can consider to avoid this issue:

    1. If your data source supports it, you can configure your indexer to do incremental indexing. For example, if you’re using Azure Blob Storage, the indexer uses the metadata_storage_last_modified field to identify changed files and avoid re-indexing everything after the initial indexing
    2. If your documents are small, you can increase the batch size to speed up the indexing process
    3. If your data source is partitioned, you can create multiple indexers to index different partitions in parallel.
    4. You can schedule the indexer to run during off-peak hours to minimize the impact on your application’s performance.

    Best,

    Grace

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.