Reliability in Azure Chaos Studio

This article describes reliability and availability zones support in Azure Chaos Studio. For a more detailed overview of reliability in Azure, see Azure reliability.

Availability zone support

Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.

Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see Regions and availability zones.

Azure availability zones-enabled services are designed to provide the right level of reliability and flexibility. They can be configured in two ways. They can be either zone redundant, with automatic replication across zones, or zonal, with instances pinned to a specific zone. You can also combine these approaches. For more information on zonal vs. zone-redundant architecture, see Recommendations for using availability zones and regions.

Azure Chaos Studio supports zone redundancy as the default configuration within a region. Chaos Studio resources are automatically duplicated or distributed across different zones.

Prerequisites

The following regions support availability zones for Chaos Studio:

Americas Europe Asia Pacific
Brazil South Sweden Central Australia East
Central US UK South Japan East
East US Southeast Asia
East US 2
West US 2
West US 3

For detailed information on the regional availability model for Azure Chaos Studio see Regional availability of Azure Chaos Studio.

Zone down experience

In the event of a zone-wide outage, you should anticipate a brief degradation in performance and availability as the service transitions to a functioning zone. This interruption does not depend on the restoration of the affected zone, as Microsoft-managed services mitigate zone losses by using capacity from alternative zones. In the event of an availability zone outage, it's possible that a chaos experiment could encounter errors or disruptions, but crucial experiment metadata, historical data, and specific details should remain accessible, and the service should not experience a complete outage.

Cross-region disaster recovery and business continuity

Chaos Studio supports single-region geography only, and doesn't support service enabled cross-region failover.

Next steps