An Azure service that provides private connections between Azure datacenters and infrastructure, either on premises or in a colocation environment.
Hi @ Abrar Adil S,
Welcome to Microsoft Q&A Platform.
It sounds like you expected seamless failover across two different ExpressRoute circuits but saw a full outage when one circuit went down. Here's what’s likely happening and how to architect for true high availability:
Why you didn’t see automatic failover
- By default each ExpressRoute circuit operates in active-active only on its own two redundant connections. Azure treats the two circuits you added as separate resources and will prefer one path based on BGP attributes (weight, local-pref, AS-path prepends or communities). If the “preferred” circuit goes into a failed provisioning state (rather than just BGP session flapping), Azure doesn’t immediately withdraw all routes or shift traffic to the other circuit—you often have to “Refresh” or manually disable that peering to force a switch.
- If you’ve inadvertently prepended AS paths or set unequal local-preference on the second circuit, Azure will keep advertising and trying to use the primary until you tear down the failed one.
Is this expected behavior?
- Yes. ExpressRoute’s cross-circuit failover relies on clean BGP convergence. A circuit in a failed provisioning state doesn’t behave like a simple BGP flap. Azure will hold onto the resource until you disable/refresh that peering. This is a known limitation—you won’t get an instant automatic switch until routes are actually withdrawn.
Best practices for true HA and resiliency a. Active-Active at every level
- Provision two fully independent circuits (from different providers) into different peering locations or metros.
- Use the Premium ExpressRoute gateway SKU or a zone-redundant VPN gateway so you can attach both circuits to separate instances. Each gateway instance should be in active-active mode. b. Equalize BGP metrics
- Don’t prepend on one circuit or bias local-pref. Advertise identical prefixes and AS-paths so Azure load-balances per-flow across both circuits. c. Zone-redundant Gateway
- Deploy your ExpressRoute/VPN gateway across Availability Zones. This protects against a single AZ failure. https://docs.microsoft.com/azure/vpn-gateway/create-zone-redundant-vnet-gateway d. Backup over Internet VPN
- Coexist a Site-to-Site VPN as a passive backup for private peering. If both ExpressRoute circuits go down, your on-premises can still reach Azure over IPsec. https://docs.microsoft.com/azure/expressroute/use-s2s-vpn-as-backup-for-expressroute-privatepeering e. Test and validate
- Use the ExpressRoute resiliency validation preview to simulate site/circuit failures and measure BGP convergence times. https://docs.microsoft.com/azure/expressroute/resiliency-validation
Reference
https://docs.microsoft.com/azure/expressroute/design-architecture-for-resiliency
Please
and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.