Disaster recovery for enterprise bots

Bot Service
Front Door
Traffic Manager

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

To plan disaster recovery for an enterprise-grade conversational bot (chatbot), start by reviewing the service level agreement (SLA). The SLA should describe the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) targets for the chatbot. Then, implement the patterns in this article to build highly available and disaster-resilient chatbot solutions to meet the SLA.

The core components of a typical enterprise-grade chatbot solution in Azure are discussed in Enterprise-grade conversational bot.

Potential use cases

This solution is ideal for the telecommunications industry. This article covers the most essential design aspects, and introduces the tools needed to build a robust, secure, and actively learning bot.


The diagram below shows deployment of a chatbot solution for disaster recovery. The failover mode is active-passive in two different Azure regions.

Architecture diagram: deployment of a chatbot solution for disaster recovery, with active-passive failover mode in two different Azure regions.

Download a Visio file of this architecture.


Disaster recovery solutions vary depending on your SLA and the Azure services you use.

Non-regional services

Azure Active Directory (Azure AD), Azure Traffic Manager, Azure Front Door, and Azure Bot Service registration are non-regional services. They're always available in Azure geographies, whatever the specific region availability or outage.

Regional services with automatic failover

Although you provision Azure Key Vault and Language Understanding Intelligent Service (LUIS) in a specific Azure region, these services provide automatic failover to a different Azure region. For more information, see:

Regional services without automatic failover

These services may need your attention to ensure high availability and disaster recovery.

Keep all deployment and source code artifacts in a source code repository, and use Azure paired regions to deploy them in parallel. You can automate all the following deployment tasks and save them as part of your deployment artifacts. When you deploy these services in the two paired regions, configure your bot API environment variables to match the specific services in each Azure region.

  • Keep the primary and secondary Azure search indexes in sync. For a sample app to back up and restore Azure search indexes, see QnAMakerBackupRestore on GitHub.
  • Back up Application Insights by using continuous export. Although you can't currently import the exported telemetry to another Application Insights resource, you can export into a storage account for further analysis.
  • To set up high availability and disaster recovery for Azure Storage accounts, see Disaster recovery and storage account failover.
  • Deploy the bot API and QnA Maker into an Azure App Service Plan in both regions.
  • Once you set up the primary and secondary stacks, use Azure Traffic Manager or Azure Front Door to configure the endpoints. Set up a routing method for both QnA Maker and the bot API.
  • Create a Secure Sockets Layer (TLS/SSL) certificate for your traffic manager endpoint, and bind the TLS/SSL certificate in your App Services.
  • Finally, use the Traffic Manager or Azure Front Door endpoint of QnA Maker in your bot. Then, use the Traffic Manager endpoint of the bot API as the bot endpoint in Azure Bot Service registration.


Key technologies used to implement this architecture:


This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps

Product documentation:

Article on availability:

Azure Architecture Center: