Azure Site Recovery w/Private Endpoints - "Mobility service periodic refresh failed."

Question

Hello,

I am in the process of configuring Azure Site Recovery with private endpoints in a sandbox environment to provide Azure-to-Azure (A2A) inter-region failover and failback capabilities.

I followed all instructions outlined via the following article: https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-how-to-enable-replication-private-endpoints

My design used within my sandbox environment consists of the following configuration:

Single Azure subscription.
The Azure Recovery Services vault is configured to automatically update the ASR extension on protected VMs via an Azure Automation account that is not configured with private endpoints.
Protected VM operating systems are: Linux (redhat 7.4) and Linux (centos 7.9.2009).
Primary Region: East US Within this region, I have a single VNet that is not peered to any other VNets (completely isolated) containing 2 subnets: 1 subnet containing the source VM I would like to protect, and 1 subnet designated for private endpoints for both the Azure Recovery Services (ARS) vault and local cache storage account. The local cache storage account in the primary region is configured with private endpoints and not accessible publicly.
Failover Region: West US Similar as the primary region, with a separate isolated VNet containing the same subnets as the primary region VNet.
Private endpoints to the ARS vault and local cache storage account are hosted within this region.
There are no NSG rules or NVAs/firewalls that are configured to block outbound network traffic from the protected VMs (if there were, I would have not been able to protect them and register them in the vault). Everything appears to be functioning correctly. Replication health for the VMs in the primary region are showing a green "healthy", and I am able to successfully failover both of my source VMs to the secondary region, and failback to the primary region (re-protecting the VMs in question in each region after failover/failback of course).

I am however receiving several "Mobility service periodic refresh failed." errors within the ARS vault's event log as shown below:

Event Name
Mobility service periodic refresh failed.
Event Type
VmHealth
Source
asrtestcentos02
Associated servers
asr-a2a-default-eastus
Time
2/18/2023, 8:44:59 PM
Error ID
152003
Error Message

      Site recovery mobility service refresh operation with Recovery Services vault failed.
      URI https://-asr-pod01-rcm1.wus.privatelink.siterecovery.windowsazure.com, Error: 20505.
    
Possible causes
You might have an NSG rule or firewall setting which prevents mobility service from accessing Site recovery service endpoints. 
Recommendation

      1. If you are using firewall proxy to control outbound network connectivity on the VM, ensure you allow communication to the prerequisite URLs or datacenter IP ranges. Refer to https://aka.ms/a2a-firewall-proxy-guidance
      2. If you are using Azure Network security group (NSG) rules to control outbound network connectivity on the VM, ensure you allow communication to the prerequisite URLs or datacenter IP ranges. Refer to https://aka.ms/a2a-nsg-guidance
      3. If storage account is deleted, disable replication on the VM and enable replication again.

      If the events are continuous even after above issues are fixed, contact support.
    
Related links
https://aka.ms/a2a-firewall-proxy-guidance
https://aka.ms/a2a-nsg-guidance

Again, all failover and failback functionality is working from what I can tell, but the error above will show up in Azure Monitor and result in an email notification.

What is the cause of this error?
Can this error be safely ignored, or does this indicate another underlying issue that can impact failover/failback operations?
What activities are actually performed during a "mobility service periodic refresh" operation?

Thanks

Answer

Hello @Angelo Cuesta Jr Thank you for reaching out to us on Microsoft Q&A platform. Sorry for the inconvenience this must have caused and apologise for delayed response.

I see that you have been noticing an error "Mobility service periodic refresh failed."

However, you described the current state of the service and if we go by that then there doesn't seem to be any problem. I think the issue is intermittent in nature.

Just to isolate the issue, please ensure that you allowed all the ASR required ULRs in firewall and NVA as mentioned in following article here post which the mobility agent refresh should complete without any issues.

If this is still an issue, please do let me know.

If the response helped, do "Accept Answer" and up-vote it

Azure Site Recovery w/Private Endpoints - "Mobility service periodic refresh failed."

1 answer