Reliability in Azure Data Manager for Energy

This article describes reliability support in Azure Data Manager for Energy, and covers both regional resiliency with availability zones and cross-region resiliency with disaster recovery. For a more detailed overview of reliability in Azure, see Azure reliability.

Availability zone support

Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.

Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see Regions and availability zones.

Azure availability zones-enabled services are designed to provide the right level of reliability and flexibility. They can be configured in two ways. They can be either zone redundant, with automatic replication across zones, or zonal, with instances pinned to a specific zone. You can also combine these approaches. For more information on zonal vs. zone-redundant architecture, see Recommendations for using availability zones and regions.

Azure Data Manager for Energy supports zone-redundant instances by default and there's no additional configuration required.

Prerequisites

The Azure Data Manager for Energy supports availability zones in the following regions:

Americas Europe Asia Pacific
South Central US North Europe Australia East
East US West Europe
Brazil South

Zone down experience

During a zone-wide outage, no action is required during zone recovery. There may be a brief degradation of performance until the service self-heals and rebalances underlying capacity to adjust to healthy zones. During this period, you may experience 5xx errors and you may have to retry API calls until the service is restored.

Cross-region disaster recovery and business continuity

Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.

When it comes to DR, Microsoft uses the shared responsibility model. In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you are responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.

Disaster recovery in multi-region geography

Azure Data Manager for Energy is a regional service and, therefore, is susceptible to region-down service failures. Azure Data Manager for Energy follows an active-passive failover configuration to recover from regional disaster. An active-passive configuration keeps warm Azure Data Manager for Energy resource running in the secondary region, but doesn't send traffic there unless the primary region fails.

Diagram of Azure data manager for energy cross region disaster recovery workflow.

Below is the list of primary and secondary regions for regions where disaster recovery is supported:

Geography Primary Secondary
Americas South Central US North Central US
Americas East US West US
Americas Brazil South*
Europe North Europe West Europe
Europe West Europe North Europe
Asia Pacific Australia East Australia

(*) These regions are restricted in supporting customer scenarios for disaster recovery. For more information please contact your Microsoft sales or customer representatives.

Azure Data Manager for Energy uses Azure Storage, Azure Cosmos DB and Elasticsearch index as underlying data stores for persisting your data partition data. These data stores offer high durability, availability, and scalability. Azure Data Manager for Energy uses geo-zone-redundant storage or GZRS to automatically replicate data to a secondary region that's hundreds of miles away from the primary region. The same security features enabled in the primary region (for example, encryption at rest using your encryption key) to protect your data are applicable to the secondary region. Similarly, Azure Cosmos DB is a globally distributed data service, which replicates the metadata (catalog) across regions. Elasticsearch index snapshots are taken at regular intervals and geo-replicated to the secondary region. All inflight data are ephemeral and therefore subject to loss. For example, in-transit data that is part of an on-going ingestion job that isn't persisted yet is lost, and you must restart the ingestion process upon recovery.

Important

In the following regions, disaster recovery is not available. For more information please contact your Microsoft sales or customer representative.

  1. Brazil South

Set up disaster recovery and outage detection

Azure Data Manager for Energy service continuously monitors service health in the primary region. If a hard service down failure is detected in the primary region, we attempt recovery before initiating failover to the secondary region on your behalf. We will notify you about the failover progress. Once the failover completes, you could connect to the Azure Data Manager for Energy resource in the secondary region and continue operations. However, there could be slight degradation in performance due to any capacity constraints in the secondary region.

Managing the resources in your subscription

You must handle the failover of your business apps connecting to Azure Data Manager for Energy resource and hosted in the same primary region. Additionally, you're responsible for recovering any diagnostic logs stored in your Log Analytics Workspace.

If you set up private links to your Azure Data Manager for Energy resource in the primary region, then you must create a secondary private endpoint to the same resource in the paired region.

Caution

If you don't enable public access networks or create a secondary private endpoint before an outage, you'll lose access to the failed over Azure Data Manager for Energy resource in the secondary region. You will be able to access the Azure Data Manager for Energy resource only after the primary region failback is complete.

Important

After failover and until the primary region failback completes, you will be unable to perform state modifications to Azure Data Manager for Energy resource created in your subscription. For example,

  • you cannot Enable or Disable public access networks.
  • you cannot Approve or Reject private endpoint connection to Azure Data Manager for Energy resource
  • you cannot create a new data partition.

Next steps