Reliability in Azure Storage Mover

This article describes reliability support in Azure Storage Mover and covers both intra-regional resiliency with availability zones and cross-region disaster recovery and business continuity. For a more detailed overview of reliability principles in Azure, see Azure reliability.

Availability zone support

Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.

Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see Regions and availability zones.

Azure availability zones-enabled services are designed to provide the right level of reliability and flexibility. They can be configured in two ways. They can be either zone redundant, with automatic replication across zones, or zonal, with instances pinned to a specific zone. You can also combine these approaches. For more information on zonal vs. zone-redundant architecture, see Recommendations for using availability zones and regions.

Azure Storage Mover supports a zone-redundant deployment model.

When you deploy an Azure Storage Mover resource, you must select a particular region in which the resource's instance metadata is stored.

If the region supports availability zones, the instance metadata is automatically replicated across multiple availability zones within that region.

Important

Azure Storage Mover instance metadata includes projects, endpoints, agents, job definitions, and job run history, but doesn't include the actual data to be migrated. Azure storage accounts that are used as migration targets have their own reliability support.

Prerequisites

Zone down experience

During a zone-wide outage, no action is required during zone recovery. Azure Storage Mover is designed to self-heal and re-balance itself to take advantage of the healthy zone automatically.

Any migration target storage account may require its own recovery steps. This requirement depends on the redundancy options chosen for each storage account. See the storage account disaster recovery guide to determine whether more steps are necessary.

If a local storage was chosen in lieu of redundancy options, you may need to create a new storage account for use in migrations during the outage.

Cross-region disaster recovery and business continuity

Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.

When it comes to DR, Microsoft uses the shared responsibility model. In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you are responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.

When a Storage Mover agent is registered, it connects to the region in which the Storage Mover resource is registered. If an agent's Azure region experiences an outage, the agent itself isn't affected, but management operations that rely on Azure may be unable to complete. In addition, any active data migrations to storage accounts located within the affected region may fail.

Storage Mover supports two forms of disaster recovery:

Important

Disaster recovery for on-premises data sources is the responsibility of the customer.

Azure initiated disaster recovery

Azure initiated disaster recovery is only applicable to those regions that have region pairs. When cross-region replication is utilized, instance metadata is replicated to each region, but is never permitted to leave the geography.

Azure Storage Mover uses Cosmos DB for storing instance metadata. Data loss may occur only with an unrecoverable disaster in the Azure Cosmos DB . For more information, see Region outages. Azure initiated recovery is active-passive, and full recovery of a region may be up to 24 hours.

Customer initiated disaster recovery

Customer initiated disaster recovery isn't restricted to paired regions.

Before a regional outage occurs:

  • Deploy a zone-redundant Storage Mover by creating Storage Mover resources in a region that supports availability zones.

  • Periodically - either on a schedule or after you make substantial changes - take a snapshot of your Storage Mover resources. Storing the snapshots using a version control system is a good way to store and track history of the snapshots. You'll use the last good snapshot in the event of a disaster where you need to recover your resources in a new region.

During a regional outage:

You can do one of two things:

  • Choose to wait for Azure to recover the region.
  • Minimize downtime by redeploying your resources to a different region. Since access to your resources may be impacted during an outage, you'll want to use the last good snapshot of your resources.

Tip

Either one of these strategies still may require that you need to take further steps prior to a disaster, so be sure to review and plan accordingly.

Deploy resources to a different region

See the documentation on exporting templates for further instructions on exporting resources as an Azure Resource Manager (ARM) template.

If your Storage Mover and related resources reside in a container with no extra resources, you should perform a Resource Group export to capture the current state. However, if your resource group contains unrelated resources, you may need to remove or otherwise exclude the resources from the template.

Existing agents can't be redeployed to a different region. If the region in which they were originally configured experiences an outage, it may not be possible to completely unregister and re-register the agent. This document assumes that new agents are registered within a new region.

To use the exported template for disaster recovery, a few changes to the template are required.

  • First, remove any Microsoft.StorageMover/agents and Microsoft.HybridCompute/machines resources from the template. Be sure to remove any dependency references to these resources as well.
  • Next, remove the agentResourceId property from all job definitions. You'll need to assign them to a new Agent after deployment.
  • After removing all references to agent and Hybrid Compute machine resources, update the location property of the top level Storage Mover resource. Replace the name of the currently deployed region with the name of the new region.
  • Finally, determine whether to keep the existing storage account resource ID. If necessary, replace it with a different storage account.

After completing the previous steps and verifying that the template parameters are correct, the template is ready for deployment to a new region. You should deploy the template to a new resource group that has the same default region as the location property in the template.

Registering the new agent

Follow the steps within the deploy an Azure Storage Mover agent article to register a new agent in the new Storage Mover resource.

Assigning the agent to job definitions

After the new agent has been registered and reports as online, use the Azure portal or PowerShell to associate the existing job definitions to the new agent. The following PowerShell example is provided for convenience.

See the define a new migration job for guidance on how to access the job definitions for your project.


## Update the agent in a job definition resource
$resourceGroupName  = "[Your resource group name]"
$storageMoverName   = "[Your storage mover name]"
$projectName        = "[Your project name]"
$jobDefName         = "[Your job definition name]"
$agentName          = "[The name of an agent previously registered to the same storage mover resource]"

Update-AzStorageMoverJobDefinition `
    -ResourceGroupName $resourceGroupName `
    -StorageMoverName $storageMoverName `
    -ProjectName $projectName `
    -Name $jobDefName `
    -AgentName $agentName

Granting agent access to the target storage container

You need to assign the data contributor role to the managed identity to successfully perform a migration job. Assign the Hybrid Compute resource's system managed identity access to the target storage account resource. The assign a managed identity access to a resource article provides guidance on how to grant access to the target resource.

You're now ready to start migration jobs using the newly deployed Storage Mover resources.

Next steps