Change and delete detection using indexers for Azure Storage in Azure Cognitive Search
After an initial search index is created, you might want subsequent indexer jobs to only pick up new and changed documents. For indexed content that originates from Azure Storage, change detection occurs automatically because indexers keep track of the last update using the built-in timestamps on objects and files in Azure Storage.
Although change detection is a given, deletion detection is not. An indexer doesn't track object deletion in data sources. To avoid having orphan search documents, you can implement a "soft delete" strategy that results in deleting search documents first, with physical deletion in Azure Storage following as a second step.
There are two ways to implement a soft delete strategy:
Use consistent document keys and file structure. Changing document keys or directory names and paths (applies to ADLS Gen2) breaks the internal tracking information used by indexers to know which content was indexed, and when it was last indexed.
ADLS Gen2 allows directories to be renamed. When a directory is renamed, the timestamps for the blobs in that directory do not get updated. As a result, the indexer will not re-index those blobs. If you need the blobs in a directory to be reindexed after a directory rename because they now have new URLs, you will need to update the
LastModified timestamp for all the blobs in the directory so that the indexer knows to re-index them during a future run. The virtual directories in Azure Blob Storage cannot be changed, so they do not have this issue.
Native blob soft delete (preview)
For this deletion detection approach, Cognitive Search depends on the native blob soft delete feature in Azure Blob Storage to determine whether blobs have transitioned to a soft deleted state. When blobs are detected in this state, a search indexer uses this information to remove the corresponding document from the index.
Requirements for native soft delete
- Enable soft delete for blobs.
- Blobs must be in an Azure Blob Storage container. The Cognitive Search native blob soft delete policy is not supported for blobs in ADLS Gen2.
- Document keys for the documents in your index must be mapped to either be a blob property or blob metadata.
- You must use the preview REST API (
api-version=2020-06-30-Preview) or the indexer Data Source configuration in your Cognitive Search Service from the Azure portal, to configure support for soft delete.
How to configure deletion detection using native soft delete
In Blob storage, when enabling soft delete, set the retention policy to a value that's much higher than your indexer interval schedule. This way if there's an issue running the indexer or if you have a large number of documents to index, there's plenty of time for the indexer to eventually process the soft deleted blobs. Azure Cognitive Search indexers will only delete a document from the index if it processes the blob while it's in a soft deleted state.
In Cognitive Search, set a native blob soft deletion detection policy on the data source. You can do this either from the Azure portal or by using preview REST API (
On the Cognitive Search service Overview page, go to New Data Source, a visual editor for specifying a data source definition.
The following screenshot shows where you can find this feature in the portal.
On the New Data Source form, fill out the required fields, select the Track deletions checkbox and choose Native blob soft delete. Then hit Save to enable the feature on Data Source creation.
Submit and view feedback for