Integrate ExpressRoute with disaster recovery for Azure virtual machines
This article describes how to integrate Azure ExpressRoute with Azure Site Recovery, when you set up disaster recovery for Azure virtual machines to a secondary Azure region.
Site Recovery enables disaster recovery of Azure virtual machines by replicating Azure virtual machine data to Azure.
- If Azure virtual machines use Azure managed disks, virtual machine data is replicated to a replicated managed disk in the secondary region.
- If Azure virtual machines don't use managed disks, virtual machine data is replicated to an Azure storage account.
- Replication endpoints are public, but replication traffic for Azure virtual machines doesn't cross the internet.
ExpressRoute enables you to extend on-premises networks into the Microsoft Azure cloud over a private connection, facilitated by a connectivity provider. If you have ExpressRoute configured, it integrates with Site Recovery as follows:
- During replication between Azure regions: Replication traffic for Azure virtual machine disaster recovery is within Azure only, and ExpressRoute isn't needed or used for replication. However, if you're connecting from an on-premises site to the Azure virtual machines in the primary Azure site, there are many issues to be aware of when you're setting up disaster recovery for those Azure virtual machines.
- Failover between Azure regions: When outages occur, you fail over Azure virtual machines from the primary to secondary Azure region. After failing over to a secondary region, there are many steps to take in order to access the Azure virtual machines in the secondary region using ExpressRoute.
Before you begin
Before you begin, make sure you understand the following concepts:
- ExpressRoute circuits
- ExpressRoute routing domains
- ExpressRoute locations.
- Azure virtual machine replication architecture
- How to set up replication for Azure virtual machines.
- How to fail over Azure virtual machines.
General recommendations
For best practice, and to ensure efficient Recovery Time Objectives (RTOs) for disaster recovery, we recommend you do the following when you set up Site Recovery to integrate with ExpressRoute:
- Provision networking components before failover to a secondary region:
- When you enable replication for Azure virtual machines, Site Recovery can automatically deploy networking resources using the source network settings. For example, networks, subnets, and gateways in the target Azure region.
- Site Recovery can't automatically set up networking resources such as VNet gateways.
- We recommend you provision these extra networking resources before failover. A small downtime is associated with this deployment, and it can impact the overall recovery time, if you didn't account for it during deployment planning.
- Run regular disaster recovery drills:
- A drill validates your replication strategy without data loss or downtime, and doesn't affect your production environment. It helps avoid last-minute configuration issues that can adversely impact RTO.
- When you run a test failover for the drill, we recommend using a separate Azure virtual machine network instead of the default network set up during replication.
- Use different IP address spaces if you have a single ExpressRoute circuit.
- We recommend that you use a different IP address space for the target virtual network. This avoids issues when establishing connections during regional outages.
- If you can't use a separate address space, be sure to run the disaster recovery drill test failover on a separate test network with different IP addresses. You can’t connect two VNets with overlapping IP address space to the same ExpressRoute circuit.
Replicate Azure virtual machines when using ExpressRoute
If you want to set up replication for Azure virtual machines in a primary site, and you're connecting to these virtual machines from your on-premises site over ExpressRoute, here's what you need to do:
- Enable replication for each Azure virtual machine.
- Optionally let Site Recovery set up networking:
- When you configure and enable replication, Site Recovery sets up networks, subnets, and gateway subnets in the target Azure region to match those in the source region. Site Recovery also maps between the source and target virtual networks.
- If you don't want Site Recovery to do this automatically, create the target-side network resources before you enable replication.
- Create other networking elements:
- Site Recovery doesn't create route tables, VNet gateways, VNet gateway connections, VNet peering, or other networking resources and connections in the secondary region.
- You need to create these extra networking elements in the secondary region, anytime before running a failover from the primary region.
- You can use recovery plans and automation scripts to set up and connect these networking resources.
- If you have a network virtual appliance (NVA) deployed to control the flow of network traffic:
- Azure's default system route for Azure virtual machine replication is 0.0.0.0/0.
- Typically, NVA deployments also define a default route (0.0.0.0/0) that forces outbound Internet traffic to flow through the NVA. The default route is used when no other specific route configuration can be found.
- If so, the NVA might be overloaded if all replication traffic passes through the NVA.
- The same limitation also applies when using default routes for routing all Azure virtual machine traffic to on-premises deployments.
- In this scenario, we recommend that you create a network service endpoint in your virtual network for the Microsoft.Storage service, so that the replication traffic doesn't leave Azure boundary.
Replication example
In typical enterprise deployments, workloads are distributed across multiple Azure VNets with a central hub for internet and on-premises connectivity. This setup often uses a hub-and-spoke topology with ExpressRoute.
- Region. Apps are deployed in the Azure East Asia region.
- Spoke vNets. Apps are deployed in two spoke vNets:
- Source vNet1: 10.1.0.0/24.
- Source vNet2: 10.2.0.0/24.
- Each spoke virtual network is connected to Hub vNet.
- Hub vNet. There's a hub vNet Source Hub vNet: 10.10.10.0/24.
- This hub vNet acts as the gatekeeper.
- All communications across subnets go through this hub.
- Hub vNet subnets. The hub vNet has two subnets:
- NVA subnet: 10.10.10.0/25. This subnet contains an NVA (10.10.10.10).
- Gateway subnet: 10.10.10.128/25. This subnet contains an ExpressRoute gateway connected to an ExpressRoute connection that routes to the on-premises site via a private peering routing domain.
- The on-premises datacenter has an ExpressRoute circuit connection through a partner edge in Hong Kong Special Administrative Region.
- All routing is controlled through Azure route tables (UDR).
- All outbound traffic between vNets, or to the on-premises datacenter is routed through the NVA.
Hub and spoke peering settings
Spoke to hub
Direction | Setting | State |
---|---|---|
Spoke to hub | Allow virtual network address | Enabled |
Spoke to hub | Allow forwarded traffic | Enabled |
Spoke to hub | Allow gateway transit | Disabled |
Spoke to hub | Use remove gateways | Enabled |
Hub to spoke
Direction | Setting | State |
---|---|---|
Hub to spoke | Allow virtual network address | Enabled |
Hub to spoke | Allow forwarded traffic | Enabled |
Hub to spoke | Allow gateway transit | Enabled |
Hub to spoke | Use remove gateways | Disabled |
Example steps
In our example, the following should happen when enabling replication for Azure virtual machines in the source network:
- You enable replication for a virtual machine.
- Site Recovery creates replica vNets, subnets, and gateway subnets in the target region.
- Site Recovery creates mappings between the source networks and the replica target networks it creates.
- You manually create virtual network gateways, virtual network gateway connections, virtual network peering, or any other networking resources or connections.
Fail over Azure virtual machines when using ExpressRoute
After you fail Azure virtual machines over to the target Azure region using Site Recovery, you can access them using ExpressRoute private peering.
- You need to connect ExpressRoute to the target vNet with a new connection. The existing ExpressRoute connection isn't automatically transferred.
- The way in which you set up your ExpressRoute connection to the target vNet depends on your ExpressRoute topology.
Access with two circuits
Two circuits with two peering locations
This configuration helps protect ExpressRoute circuits against regional disaster. If your primary peering location goes down, connections can continue from the other location.
- The circuit connected to the production environment is usually the primary. The secondary circuit typically has lower bandwidth, which can be increased if a disaster occurs.
- After failover, you can establish connections from the secondary ExpressRoute circuit to the target vNet. Alternatively, you can have connections set up and ready in disaster, to reduce overall recovery time.
- With simultaneous connections to both primary and target vNets, make sure that your on-premises routing only uses the secondary circuit and connection after failover.
- The source and target vNets can receive new IP addresses, or keep the same ones, after failover. In both cases, the secondary connections can be established prior to failover.
Two circuits with single peering location
This configuration helps protect against failure of the primary ExpressRoute circuit, but not if the single ExpressRoute peering location goes down, impacting both circuits.
- You can have simultaneous connections from the on-premises datacenter to source vNEt with the primary circuit, and to the target vNet with the secondary circuit.
- With simultaneous connections to primary and target, make sure that on-premises routing only uses the secondary circuit and connection after failover.
- You can’t connect both circuits to the same vNet when circuits are created at the same peering location.
Access with a single circuit
In this configuration there's only one Expressroute circuit. Although the circuit has a redundant connection in case one goes down, a single route circuit won't provide resilience if your peering region goes down. Note that:
- If the target Azure region isn't in the same location as the source, you need to enable ExpressRoute Premium if you’re using a single ExpressRoute circuit. Learn about ExpressRoute locations and ExpressRoute pricing.
- You can’t connect source and target vNets simultaneously to the circuit if the same IP address space is used on the target region. In this scenario:
- Disconnect the source side connection, and then establish the target side connection. This connection change can be scripted as part of a Site Recovery recovery plan. Note that:
- In a regional failure, if the primary region is inaccessible, the disconnect operation could fail. This could impact connection creation to the target region.
- If you created the connection in the target region, and primary region recovers later, you might experience packet drops if two simultaneous connections attempt to connect to the same address space.
- To prevent this, terminate the primary connection immediately.
- After virtual machine failback to the primary region, the primary connection can again be established, after you disconnect the secondary connection.
- Disconnect the source side connection, and then establish the target side connection. This connection change can be scripted as part of a Site Recovery recovery plan. Note that:
- If a different address space is used on the target vNet, you can simultaneously connect to the source and target vNets from the same ExpressRoute circuit.
Failover example
In our example, we're using the following topology:
- Two different ExpressRoute circuits in two different peering locations.
- Retain private IP addresses for the Azure virtual machines after failover.
- The target recovery region is Azure SouthEast Asia.
- A secondary ExpressRoute circuit connection is established through a partner edge in Singapore.
For a simple topology that uses a single ExpressRoute circuit, with same IP address after failover, review this article.
Example steps
To automate recovery in this example, here's what you need to do:
Follow the steps to set up replication.
Fail over the Azure virtual machines, with these extra steps during or after the failover.
a. Create the Azure ExpressRoute Gateway in the target region hub VNet. This is need to connect the target hub vNet to the ExpressRoute circuit.
b. Create the connection from the target hub vNet to the target ExpressRoute circuit.
c. Set up the VNet peerings between the target region’s hub and spoke virtual networks. The peering properties on the target region are the same as those on the source region.
d. Set up the UDRs in the hub VNet, and the two spoke VNets.
- The properties of the target side UDRs are the same as those on the source side when using the same IP addresses.
- With different target IP addresses, the UDRs should be modified accordingly.
The above steps can be scripted as part of a recovery plan. Depending on the application connectivity and recovery time requirements, the above steps can also be completed prior to starting the failover.
After recovery
After recovering the virtual machines and completing connectivity, the recovery environment is as follows.
Next steps
Learn more about using recovery plans to automate app failover.