Site Recovery - Azure Regions - Failback

Nathan Farrar 67 Reputation points
2021-08-09T14:30:21.267+00:00

I'm finding that failover (and test failover) in Azure Site Recovery makes total sense and works well.. but failback is a convoluted process and really makes Site Recovery difficult to work with. Looking for input on this.

What I'd expect to be able to do is to use a recovery plan to failover a group of VMs into a separate region. I'd also expect to be able to fail those resource back simply, but that doesn't appear to be the case at all. Test failover is a useful tool but can cause issues and has limitations. Testing a single server isn't very useful and in order to get a true test, you'd need to make sure DNS is working and on-prem sites can access resources in Azure. Test failovers keep both machines running which can cause DNS issues and prevents the ability to truly test a DR scenario. It's a theoretical test IMO, not a true test (and doesn't pass as a true test for most companies)

Here is how I understand the only way to do a true failover and failback:

Assumptions - two regions with non-overlapping VNETs, an existing domain controller in both and VPN access to both.

  1. Failover VM(s) to secondary location (source VM is shutdown and new VM is brought online and registers with local DC for DNS.
  2. Confirm everything is working for application and remote access via VPN is good. - failover works well -
  3. Commit failover - unable to change recovery point after committing -
  4. Now VM is running in backup/recovery region.
  5. Must now delete source VM resources <- this is a pain point
  6. Re-protect VM(s) running in backup/recovery region <- VMs will be replicated back to original source region, must wait for replication to sync

-- "Failback" Process --

  1. Failover VM(s) again to original source Region
  2. Confirm everything is working again
  3. Delete VM resources in backup/recovery region.
  4. Re-protect VM(s) now running in the original source region - VMs will be replicated to the original backup/recovery region
  5. Delete VM resources in backup/recovery region.

Depending on the size of the VMs and number of VMs, testing failover could be a very involved event with significant risk of issues due to all the steps. Recovery plans would seem to be a good solution but they do not survive the "failback" process and are essentially useless after they failover VMs, they only do one direction and seem to simply lose track of what's going on after one use.

I'm using this Microsoft guide:
https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-tutorial-failover-failback

Seems Site Recovery is a half-baked solution for BC/DR, at least in this use case. I'm sure I'll be told to use the test failover option, but that won't work as a true test.

Thoughts? Is there a failback process I'm missing? I'm sure I can script some automation but I want to exhaust native built-in processes first.

Thanks!

Azure Site Recovery
Azure Site Recovery
An Azure native disaster recovery service. Previously known as Microsoft Azure Hyper-V Recovery Manager.
825 questions
{count} votes

Accepted answer
  1. SadiqhAhmed-MSFT 49,331 Reputation points Microsoft Employee Moderator
    2021-08-10T10:46:06.653+00:00

    Hello @Anonymous - Thank you for reaching out!

    #5. Must now delete source VM resources <- this is a pain point

    • This is not needed. If you leave the source VM in shut down state, it can be reused as the failed back VM, saving you replication dollars – as we identify and only replicate back the changes.

    "Failback" Process --

    # 3. Delete VM resources in backup/recovery region.

    • This is an unnecessary step. When you click on ‘Re-protect’ after failback, ASR cleans up the DR region for you. # 5. Delete VM resources in backup/recovery region.
      • You should not delete VM resources in the DR region after you re-protect from the primary region to the DR region as your failover will fail if you do so.

    Depending on the size of the VMs and number of VMs, testing failover could be a very involved event with significant risk of issues due to all the steps. Recovery plans would seem to be a good solution but they do not survive the "failback" process and are essentially useless after they failover VMs, they only do one direction and seem to simply lose track of what's going on after one use.

    • Recovery Plans do allow you to re-protect and fail back VMs to your source region. It is a bi-directional tool.

    Hope this answers your questions!


    If the response helped, do "Accept Answer" and up-vote it.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.