Reliability in Azure AI Search

Azure AI Search is a scalable search infrastructure that indexes heterogeneous content and enables retrieval through APIs, applications, and AI agents. It's suitable for enterprise search scenarios and AI-powered customer experiences that require dynamic content generation through chat completion models. As an Azure service, AI Search provides a range of capabilities to support your reliability requirements.

When you use Azure, reliability is a shared responsibility. Microsoft provides a range of capabilities to support resiliency and recovery. You're responsible for understanding how those capabilities work within all of the services you use, and selecting the capabilities you need to meet your business objectives and uptime goals.

This article describes how to make Azure AI Search resilient to a variety of potential outages and problems, including transient faults, availability zone outages, region outages, and service maintenance. It also describes how you can use backups to recover from other types of problems, and highlights some key information about the Azure AI Search service level agreement (SLA).

Production deployment recommendations for reliability

For production workloads, we recommend that you:

Use a billable tier that has at least two replicas. This configuration makes your search service more resilient to transient faults and maintenance operations. It also meets the service-level agreement (SLA) for AI Search. The SLA requires two replicas for read-only workloads and three or more replicas for read-write workloads.
Don't use the Free tier for production use. AI Search doesn't provide an SLA for the Free tier, which is limited to one replica.

Reliability architecture overview

When you use AI Search, you create a search service. Each search service supports many search indexes that store your searchable content.

AI Search isn't designed as a primary data store. Instead, you use indexers to connect your search service to external data sources. An indexer crawls the source data, invokes skills that perform processing and enrichment, and populates your index with the skill outputs.

You also configure the number of replicas for your service. In AI Search, a replica is a copy of your service's search engine. You can think of a replica as representing a single virtual machine (VM). Each search service can have between 1 and 12 replicas.

The addition of multiple replicas allows AI Search to:

Increase the availability of your search service.
Perform maintenance on one replica while queries continue to run on other replicas.
Handle higher indexing and query workloads.
Improve resiliency by attempting to provision replicas in different availability zones, if your region supports them.

AI Search automatically assigns one replica to be the primary replica. All write operations are performed against that replica. The other replicas are used for read operations.

The following diagram illustrates how a search service with three replicas might be spread across three availability zones:

You can also configure the number of partitions, which represent the storage that the search indexes use.

It's important to understand the impact of adding replicas and partitions because they each affect read and write performance in different ways. For more information about replicas and partitions, see Estimate and manage capacity of a search service.

Resilience to transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.

AI Search indexers have built-in transient fault handling. If a data source is briefly unavailable, the indexer is designed to recover and retry. It uses change tracking to resume indexing from the last successfully indexed document.

Search services might experience transient faults during standard, unscheduled maintenance operations. Azure AI Search doesn't provide advance notification or allow scheduling of maintenance at specific times. Although every effort is made to minimize downtime, even for single-replica services, brief interruptions can still occur. To improve resiliency against these transient faults, we recommend that you use two or more replicas.

If you build any applications that interact with AI Search, they should handle transient faults. Use a retry strategy with exponential backoffs for both read and write operations.

Resilience to availability zone failures

Availability zones are physically separate groups of datacenters within an Azure region. When one zone fails, services can fail over to one of the remaining zones.

AI Search is zone redundant, which means that your replicas are distributed across multiple availability zones within the search service region.

When you add two or more replicas to your service, AI Search attempts to place each replica in a different availability zone. For services that have more replicas than available zones, replicas are distributed across zones as evenly as possible.

The following diagram illustrates how an example search service with four replicas might be deployed across three availability zones:

Important

AI Search doesn't guarantee the exact placement of replicas. Placement is subject to capacity constraints, scaling operations, and other factors.

Requirements

Zone redundancy is automatically enabled when your search service meets all of the following criteria:

Region support: Support for availability zones depends on infrastructure and storage. For a list of supported regions, see Choose a region for AI Search.
Tier: Your service must be on the Basic tier or higher
Number of replicas: Your service must have at least two replicas

Note

AI Search attempts to distribute replicas across multiple zones when you have two or more replicas. However, for read-write workloads, you should use three or more replicas so that you receive the highest possible availability SLA.

Instance distribution across zones

AI Search attempts to place replicas across different availability zones. However, there are occasionally situations where all of the replicas of a search service might be placed into the same availability zone. This situation can happen when replicas are removed from your service, such as when you scale in by configuring your service to use fewer replicas. Replica removal doesn't trigger the remaining replicas to rebalance across the availability zones.

To reduce the likelihood of all of your replicas being placed into a single availability zone, you can manually trigger a scale-out operation immediately after a scale-in operation. For example, suppose that your search service has 10 replicas and you want to scale in to 7 replicas. Instead of performing a single scale operation, you can temporarily scale to 6 instances and then immediately scale to 7 instances to trigger zone rebalancing.

Cost

Each search service starts with one replica. Zone redundancy requires two or more replicas, which increases the cost to run the service. To understand the billing implications of replicas, use the pricing calculator.

Configure availability zone support

If your search service meets the requirements for zone redundancy, no extra configuration is necessary. Whenever possible, AI Search attempts to place your replicas in different availability zones.

Capacity planning and management

To prepare for availability zone failure, consider overprovisioning the number of replicas. Overprovisioning allows the search service to tolerate some capacity loss and continue to function without degraded performance. Adding replicas during an outage is challenging, so overprovisioning helps ensure that your search service can handle normal request volumes, even with reduced capacity. For more information, see Manage capacity by overprovisioning.

Behavior when all zones are healthy

This section describes what to expect when search services are configured for zone redundancy and all availability zones are operational.

Traffic routing between zones: AI Search performs automatic load balancing of all queries and writes across all of the available replicas. AI Search can send read operations to any replica in any availability zone. It sends write operations to a single primary replica that the AI Search service selects.
Data replication between zones: Changes in data are automatically replicated between replicas across availability zones. Replication occurs asynchronously, which means that writes are committed to one primary replica before they're replicated to other replicas.

Behavior during a zone failure

This section describes what to expect when search services are configured for zone redundancy and an availability zone outage occurs.

Detection and response: AI Search is responsible for detecting a failure in an availability zone. You don't need to do anything to initiate a zone failover.

Notification: Microsoft doesn't automatically notify you when a zone is down. However, you can use Azure Resource Health to monitor for the health of an individual resource, and you can set up Resource Health alerts to notify you of problems. You can also use Azure Service Health to understand the overall health of the service, including any zone failures, and you can set up Service Health alerts to notify you of problems.

Active requests: Requests that replicas process in the failed zone are terminated. Clients should retry the requests by following the guidance for handling transient faults.
Expected data loss: If the affected availability zone only contains read replicas, no data loss is expected.

If the primary replica is lost because it was in the affected zone, then any write operations that haven't yet been replicated might be lost.
Expected downtime: In most situations, a zone failure isn't expected to cause downtime to your search service for read operations because read replicas in other availability zones continue to serve requests.

If the primary replica is lost because it was in the affected zone, AI Search automatically promotes another replica to become the new primary so that write operations can resume. It typically takes a few seconds for the replica promotion to occur. During this time, write operations might not succeed. Ensure that your applications are prepared by following transient fault handling guidance.

However, there are some unlikely situations where all of your search service's replicas might be in a single availability zone. In this scenario, you might experience downtime until the zone recovers. For more information, and to understand a workaround, see Instance distribution.
Traffic rerouting: When a zone fails, AI Search detects the failure and routes requests to active replicas in the surviving zones. If the primary replica is lost, another replica is promoted to be the new primary.

Zone recovery

When the availability zone recovers, AI Search automatically restores normal operations and begins routing traffic to available replicas across all zones, including the recovered zone.

Test for zone failures

AI Search manages traffic routing for zone-redundant services. You don't need to initiate or validate any zone failure processes.

Resilience to region-wide failures

AI Search is a single-region service. If the region becomes unavailable, your search service also becomes unavailable.

Custom multi-region solutions for resiliency

You can optionally deploy multiple AI Search services in different regions. You're responsible for deploying and configuring separate services in each region. If you create an identical deployment in a secondary Azure region that uses a multi-region architecture, your application becomes less susceptible to a single-region disaster.

When you follow this approach, you must synchronize indexes across regions to recover the last application state. You must also configure load balancing and failover policies.

To optimize the performance of your overall solution, look for opportunities to perform indexing on read-only replicas of your data sources. For example, some indexers support reading from a geo-distributed data source's read replicas.

For more information, see Multi-region deployments in Azure AI Search.

Backup and restore

Because AI Search isn't a primary data storage solution, it doesn't provide self-service backup and restore options. However, you can use the index-backup-restore sample for .NET or Python to back up your index definition and its documents to a series of JSON files, which are then used to restore the index.

However, if you accidentally delete the index and don't have a backup, you can rebuild the index. Rebuilding involves recreating the index on your search service and then reloading it by retrieving data from your primary data store.

Service-level agreement

The service-level agreement (SLA) for Azure services describes the expected availability of each service and the conditions that your solution must meet to achieve that availability expectation. For more information, see SLAs for online services.

In AI Search, the availability SLA applies to search services that:

Are configured to use a billable tier.
Have at least two replicas for read-only workloads (queries).
Have at least three replicas for read-write workloads (queries and indexing).

Feedback

Was this page helpful?

Last updated on 2026-01-22