Hello,
I am in the process of configuring Azure Site Recovery with private endpoints in a sandbox environment to provide Azure-to-Azure (A2A) inter-region failover and failback capabilities.
I followed all instructions outlined via the following article: https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-how-to-enable-replication-private-endpoints
My design used within my sandbox environment consists of the following configuration:
- Single Azure subscription.
- The Azure Recovery Services vault is configured to automatically update the ASR extension on protected VMs via an Azure Automation account that is not configured with private endpoints.
- Protected VM operating systems are: Linux (redhat 7.4) and Linux (centos 7.9.2009).
- Primary Region: East US
Within this region, I have a single VNet that is not peered to any other VNets (completely isolated) containing 2 subnets: 1 subnet containing the source VM I would like to protect, and 1 subnet designated for private endpoints for both the Azure Recovery Services (ARS) vault and local cache storage account.
The local cache storage account in the primary region is configured with private endpoints and not accessible publicly.
- Failover Region: West US Similar as the primary region, with a separate isolated VNet containing the same subnets as the primary region VNet.
- Private endpoints to the ARS vault and local cache storage account are hosted within this region.
- There are no NSG rules or NVAs/firewalls that are configured to block outbound network traffic from the protected VMs (if there were, I would have not been able to protect them and register them in the vault).
Everything appears to be functioning correctly. Replication health for the VMs in the primary region are showing a green "healthy", and I am able to successfully failover both of my source VMs to the secondary region, and failback to the primary region (re-protecting the VMs in question in each region after failover/failback of course).
I am however receiving several "Mobility service periodic refresh failed." errors within the ARS vault's event log as shown below:
Event Name
Mobility service periodic refresh failed.
Event Type
VmHealth
Source
asrtestcentos02
Associated servers
asr-a2a-default-eastus
Time
2/18/2023, 8:44:59 PM
Error ID
152003
Error Message
Site recovery mobility service refresh operation with Recovery Services vault failed.
URI https://<vault id redacted>-asr-pod01-rcm1.wus.privatelink.siterecovery.windowsazure.com, Error: 20505.
Possible causes
You might have an NSG rule or firewall setting which prevents mobility service from accessing Site recovery service endpoints.
Recommendation
1. If you are using firewall proxy to control outbound network connectivity on the VM, ensure you allow communication to the prerequisite URLs or datacenter IP ranges. Refer to https://aka.ms/a2a-firewall-proxy-guidance
2. If you are using Azure Network security group (NSG) rules to control outbound network connectivity on the VM, ensure you allow communication to the prerequisite URLs or datacenter IP ranges. Refer to https://aka.ms/a2a-nsg-guidance
3. If storage account is deleted, disable replication on the VM and enable replication again.
If the events are continuous even after above issues are fixed, contact support.
Related links
https://aka.ms/a2a-firewall-proxy-guidance
https://aka.ms/a2a-nsg-guidance
Again, all failover and failback functionality is working from what I can tell, but the error above will show up in Azure Monitor and result in an email notification.
- What is the cause of this error?
- Can this error be safely ignored, or does this indicate another underlying issue that can impact failover/failback operations?
- What activities are actually performed during a "mobility service periodic refresh" operation?
Thanks