Reliability in Elastic SAN

This article describes reliability support in Azure Elastic SAN and covers both regional resiliency with availability zones and disaster recovery and business continuity.

Availability zone support

Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.

Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see Regions and availability zones.

Azure availability zones-enabled services are designed to provide the right level of reliability and flexibility. They can be configured in two ways. They can be either zone redundant, with automatic replication across zones, or zonal, with instances pinned to a specific zone. You can also combine these approaches. For more information on zonal vs. zone-redundant architecture, see Recommendations for using availability zones and regions.

Azure Elastic SAN supports availability zone deployment with locally redundant storage (LRS) and regional deployment with zone-redundant storage (ZRS).

Prerequisites

LRS and ZRS Elastic SAN are currently only available in a subset of regions. For a list of regions, see Scale targets for Elastic SAN.

Create a resource using availability zones

To create an Elastic SAN with an availability zone enabled, see Deploy an Elastic SAN.

Zone down experience

When deploying an Elastic SAN, if you select ZRS for your SAN's redundancy option, zonal failover is supported by the platform without manual intervention. An elastic SAN using ZRS is designed to self-heal and rebalance itself to take advantage of healthy zones automatically.

If you deployed an LRS elastic SAN, you may need to deploy a new SAN, using snapshots exported to managed disks.

Low-latency design

The latency differences between an elastic SAN on LRS and an elastic SAN on ZRS isn't particularly high. However, for workloads sensitive to latency spikes, consider an elastic SAN on LRS since it offers the lowest latency.

Availability zone migration

To migrate an elastic SAN on LRs to ZRS, you must snapshot your elastic SAN's volumes, export them to managed disk snapshots, deploy an elastic SAN on ZRS, and then create volumes on the SAN on ZRS using those disk snapshots. To learn how to use snapshots (preview), see Snapshot Azure Elastic SAN volumes (preview).

Disaster recovery and business continuity

Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.

When it comes to DR, Microsoft uses the shared responsibility model. In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you are responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.

Single and Multi-region disaster recovery

For Azure Elastic SAN, you're responsible for the DR experience. You can take snapshots of your volumes and export them to managed disk snapshots. Then, you can copy an incremental snapshot to a new region to store your data is in a region other than the region your elastic SAN is in. You should export to regions that are geographically distant from your primary region to reduce the possibility of multiple regions being affected due to a disaster.

Outage detection, notification, and management

You can find outage declarations in Service Health - Microsoft Azure.

Capacity and proactive disaster recovery resiliency

Microsoft and its customers operate under the Shared Responsibility Model. Shared responsibility means that for customer-enabled DR (customer-responsible services), you must address DR for any service you deploy and control. You should prevalidate any service you deploy will work with Elastic SAN. To ensure that recovery is proactive, you should always predeploy secondaries because there's no guarantee of capacity at time of impact for those who haven't preallocated.

Next steps