Protect VM workloads with Azure Site Recovery on Azure Stack HCI (preview)
Applies to: Azure Stack HCI, version 22H2 and later
This guide describes how to protect Windows and Linux VM workloads running on your Azure Stack HCI clusters if there is a disaster. You can use the Azure Site Recovery to replicate your on-premises Azure Stack HCI virtual machines (VMs) into Azure and protect your business critical workloads.
This feature is enabled on your Azure Stack HCI clusters running May 2023 cumulative update of version 22H2 and later.
Azure Site Recovery with Azure Stack HCI
Azure Site Recovery is an Azure service that replicates workloads running on VMs so that your business-critical infrastructure is protected if there's a disaster. For more information about Azure Site Recovery, see About Site Recovery.
The disaster recovery strategy for Azure Site Recovery consists of the following steps:
- Replication - Replication lets you replicate the target VM’s VHD to an Azure Storage account and thus protects your VM if there's a disaster.
- Failover - Once the VM is replicated, fail over the VM and run it in Azure. You can also perform a test failover without impacting your primary VMs to test the recovery process in Azure.
- Re-protect – VMs are replicated back from Azure to the on-premises cluster.
- Failback - You can fail back from Azure to the on-premises cluster.
In the current implementation of Azure Site Recovery integration with Azure Stack HCI, you can start the disaster recovery and prepare the infrastructure from the Azure Stack HCI cluster resource in the Azure portal. After the preparation is complete, you can finish the remaining steps from the Site Recovery resource in the Azure portal.
Azure Site Recovery doesn't support the replication, failover, and failback of the Arc resource bridge and Arc VMs.
The following diagram illustrates the overall workflow of Azure Site Recovery working with Azure Stack HCI.
Here are the main steps that occur when using Site Recovery with an Azure Stack HCI cluster:
- Start with a registered Azure Stack HCI cluster on which you enable Azure Site Recovery.
- Make sure that you meet the prerequisites before you begin.
- Create the following resources in your Azure Stack HCI resource portal:
- Recovery services vault
- Hyper-V site
- Replication policy
- Once you have created all the resources, prepare infrastructure.
- Enable VM replication. Complete the remaining steps for replication in the Azure Site Recovery resource portal and begin replication.
- Once the VMs are replicated, you can fail over the VMs and run on Azure.
The following table lists the scenarios that are supported for Azure Site Recovery and Azure Stack HCI.
Fail over Azure Stack HCI VMs to Azure followed by failback
|Azure Stack HCI VM details
|Windows Gen 1
|Failover to Azure
|Failback on same or different host as failover
|Windows Gen 2
|Failover to Azure
|Failback on same or different host as failover
|Linux Gen 1
|Failover to Azure
|Failback on same or different host as failover
Manual intervention is needed if after failover, VM is deleted on Azure Stack HCI followed by a failback to same or different host.
Prerequisites and planning
Before you begin, make sure to complete the following prerequisites:
- The Hyper-V VMs that you intend to replicate should be made highly available for replication to happen. If VMs aren't highly available, then the replication would fail. For more information, see How to make an existing Hyper-V machine VM highly available.
- Make sure that Hyper-V is set up on the Azure Stack HCI cluster.
- The servers hosting the VMs you want to protect must have internet access to replicate to Azure.
- The Azure Stack HCI cluster must already be registered.
The cluster must be running May cumulative update for Azure Stack HCI, version 22H2.
If you're running an earlier build, the Azure portal indicates that the disaster recovery isn't supported as managed identity isn't enabled for older versions.
Run the repair registration cmdlet to ensure that a managed identity is created for your Azure Stack HCI resource and then retry the workflow. For more information, go to Enable enhanced management from Azure for Azure Stack HCI.
The cluster must be Arc-enabled. If the cluster isn't Arc-enabled, you see an error in the Azure portal to the effect that the Capabilities tab isn't available.
- You need owner permissions on the Recovery Services Vault to assign permissions to the managed identity. You also need read/write permissions on the Azure Stack HCI cluster resource and its child resources.
- Review the caveats associated with the implementation of this feature.
- Review the capacity planning tool to evaluate the requirements for successful replication and failover.
Step 1: Prepare infrastructure on your target host
To prepare the infrastructure, prepare a vault and a Hyper-V site, install the site recovery extension, and associate a replication policy with the cluster nodes.
On your Azure Stack HCI target cluster, follow these steps to prepare infrastructure:
In the Azure portal, go to the Overview pane of the target cluster resource that is hosting VMs that you want to protect.
In the right-pane, go to the Capabilities tab and select the Disaster recovery tile. As managed identity is enabled on your cluster, disaster recovery should be available.
In the right-pane, go to Protect and select Protect VM workloads.
On the Replicate VMs to Azure, select Prepare infrastructure.
On the Prepare infrastructure, select an existing or create a new Recovery services vault. You use this vault to store the configuration information for virtual machine workloads. For more information, see Recovery services vault overview.
If you choose to create a new Recovery services vault, the subscription and resource groups are automatically populated.
Provide a vault name and select the location of the vault same as where the cluster is deployed.
Accept the defaults for other settings.
You will need owner permissions on the Recovery services vault to assign permissions to the managed identity. You will need read/write permission on the Azure Stack HCI cluster resource and its child resources.
Select Review + Create to start the vault creation. For more information, see Create and configure a Recovery services vault.
Select an existing Hyper-V site or create a new site.
Select an existing Replication policy or create new. This policy is used to replicate your VM workloads. For more information, see Replication policy. After the policy is created, select OK.
Select Prepare infrastructure. When you select Prepare infrastructure, the following actions occur:
A Resource Group with the Storage Account and the specified Vault and the replication policy are created in the specified Location.
An Azure Site Recovery agent is automatically downloaded on each node of your cluster that is hosting the VMs.
Managed Identity gets the vault registration key file from Recovery Services vault that you created and then the key file is used to complete the installation of the Azure Site Recovery agent. A Resource Group with the Storage Account and the specified Vault and the replication policy are created in the specified Location.
Replication policy is associated with the specified Hyper-V site and the target cluster host is registered with the Azure Site Recovery service.
If you don't have owner level access to the subscription/resource group where you create the vault, you see an error to the effect that you don't have authorization to perform the action.
Depending on the number of nodes in your cluster, the infrastructure preparation could take several minutes. You can watch the progress by going to Notifications (the bell icon at the top right of the window).
Step 2: Enable replication of VMs
After the infrastructure preparation is complete, follow these steps to select the VMs to replicate.
On Step 2: Enable replication, select Enable replication. You're now directed to the Recovery services vault where you can specify the VMs to replicate.
Select Replicate and in the dropdown select Hyper-V machines to Azure.
On the Source environment tab, specify the source location for your Hyper-V site. In this instance, you have set up the Hyper-V site on your Azure Stack HCI cluster. Select Next.
On the Target environment tab, complete these steps:
For Subscription, enter or select the subscription.
For Post-failover resource group, select the resource group name to which you fail over. When the failover occurs, the VMs in Azure are created in this resource group.
For Post-failover deployment model, select Resource Manager. The Azure Resource Manager deployment is used when the failover occurs.
For Storage account, enter or select an existing storage account associated with the subscription that you have chosen. This account could be a standard or a premium storage account that is used for the VM’s replication.
For the network configuration of the VMs that you’ve selected to replicate in Azure, provide a virtual network and a subnet that would be associated with the VMs in Azure. To create this network, see the instructions in Create an Azure network for failover.
You can also choose to do the network configuration later.
Once the VM is replicated, you can select the replicated VM and go to the Compute and Network setting and provide the network information.
On the Virtual machine selection tab, select the VMs to replicate, and then select Next. Make sure to review the capacity requirements for protecting the VM.
On the Replication settings tab, select the operating system type, operating system disk and the data disks for the VM you intend to replicate to Azure, and then select Next.
On the Replication policy tab, verify that the correct replication policy is selected. The selected policy should be the same replication policy that you created when preparing the infrastructure. Select Next.
On the Review tab, review your selections, and then select Enable Replication.
A notification indicating that the replication job is in progress is displayed. Go to Protected items > Replication items to view the status of the replication health and the status of the replication job.
To monitor the VM replication, follow these steps.
To view the Replication health and Status, select the VM and go to the Overview. You can see the percentage completion of the replication job.
To see a more granular job status and Job id, select the VM and go to the Properties of the replicated VM.
To view the disk information, go to Disks. Once the replication is complete, the Operating system disk and Data disk should show as Protected.
The next step is to configure a test failover.
Step 3: Configure and run a test failover in the Azure portal
Once the replication is complete, the VMs are protected. We do recommend that you configure failover settings and run a test failover when you set up Azure Site Recovery.
To prepare for fail over to an Azure VM, complete the following steps:
If you didn't specify the network configuration for the replicated VM, you can complete that configuration now.
- First, make sure that an Azure network is set up to test failover as per the instructions in Create a network for test failover.
- Select the VM and go to the Compute and Network settings and specify the virtual network and the subnet. The failed-over VM in Azure attaches to this virtual network and subnet.
Once the replication is complete and the VM is Protected as reflected in the status, you can start Test Failover.
To run a test failover, see the detailed instructions in Run a disaster recovery drill to Azure.
Step 4: Create Recovery Plans
Recovery Plan is a feature in Azure Site Recovery that lets you fail over and recover an entire application comprising a collection of VMs. While it's possible to recover protected VMs individually, by adding the VMs comprising an application to a recovery plan, you're able to fail over the entire application through the recovery plan.
You can also use the test failover feature of Recovery Plan to test the recovery of the application. Recovery Plan lets you group VMs, sequence the order in which they should be brought up during a failover, and automate other steps to be performed as part of the recovery process. Once you've protected your VMs, you can go to the Azure Site Recovery vault in the Azure portal and create recovery plans for these VMs. Learn more about recovery plans.
Step 5: Fail over to Azure
To fail over to Azure, you can follow the instructions in Fail over Hyper-V VMs to Azure.
Consider the following information before you use Azure Site Recovery to protect your on-premises VM workloads by replicating those VMs to Azure.
- Extensions installed by Arc aren’t visible on the Azure VMs. The Arc server will still show the extensions that are installed, but you can't manage those extensions (for example, install, upgrade, or uninstall) while the server is in Azure.
- Guest Configuration policies won't run while the server is in Azure, so any policies that audit the OS security/configuration won't run until the machine is migrated back on-premises.
- Log data (including Sentinel, Defender, and Azure Monitor info) will be associated with the Azure VM while it's in Azure. Historical data is associated with the Arc server. If it's migrated back on-premises, it starts being associated with the Arc server again. They can still find all the logs by searching by computer name as opposed to resource ID, but it's worth noting the Portal UX experiences look for data by resource ID so you'll only see a subset on each resource.
- We strongly recommend that you don't install the Azure VM Guest Agent to avoid conflicts with Arc if there's any potential that the server will be migrated back on-premises. If you need to install the guest agent, make sure that the VM has extension management disabled. If you try to install/manage extensions using the Azure VM guest agent when there are already extensions installed by Arc on the same machine (or vice versa), you run into all sorts of issues because our agents are unaware of the previous extension installations and will encounter state reconciliation issues.
Here's a list of known issues and the associated workarounds in this release:
|When you register Azure Site Recovery with a cluster, a node fails to install Azure Site Recovery or register to the Azure Site Recovery service.
|In this instance, your VMs may not be protected. Verify that all servers in the cluster are registered in the Azure portal by going to the Recovery Services vault > Jobs > Site Recovery Jobs.
|Azure Site Recovery agent fails to install. No error details are seen at the cluster or server levels in the Azure Stack HCI portal.
|When the Azure Site Recovery agent installation fails, it is because of the one of the following reasons:
- Installation fails as Hyper-V isn't set up on the cluster.
- The Hyper-V host is already associated to a Hyper-V site and you're trying to install the extension with a different Hyper-V site.