SQL AG Failover and Failback Question

TRDx2 141 Reputation points
2021-01-23T17:52:57.05+00:00

We are have a SQL AG on 3 Windows 2012R2 servers running SQL 2012 Enterprise. SQL1 and 2 are in one datacenter and SQL3 is in another datacenter. We are going to do a disaster recovery test where the main datacenter will be taken offline. When the primary data center goes offline I believe SQL1 and 2 will stay online and SQL3 will be kicked out of the cluster as it can't talk with the other two nodes. The databases will go into Not Synchronizing. At this point I would have to force the Window cluster service online with force quorum with the cluster service started I should be able to force the databases online with data loss. Similar reference is found at this link. This should get me running right?

Now coming back, when the links come back online will SQL1 and 2 go offline as soon as they see SQL3 is active? Or will I have created a split brain scenario?

Any help from someone that has gone through this would be greatly appreciated.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,147 questions
{count} votes

Accepted answer
  1. Sean Gallardy - MSFT 1,891 Reputation points Microsoft Employee
    2021-01-23T20:10:13.903+00:00

    Or will I have created a split brain scenario?

    You've technically created a split-brain when you did this: >At this point I would have to force the Window cluster service online with force quorum [...]

    Now coming back, when the links come back online will SQL1 and 2 go offline as soon as they see SQL3 is active?

    Long and the short of it is that SQL1 and SQL2 will have a different copy of the data than SQL3. You need to decide which one of those is the "correct" copy. Since SQL3 was forced for quorum the cluster services on 1 and 2 is going to resync with it and it's copy of the cluster, which means everything on 1 and 2 is going to go down. The databases don't match but they were forced failover so they won't attempt to automatically reconnect. You're going to have fun restoring databases on the node(s) that you choose don't have a correct copy of the "truth" and most like forcing failover again.

    Basically, I wouldn't do this. There is a reason it's called a disaster recovery option and not a "Test your BCDR" option. There are better ways of testing.


2 additional answers

Sort by: Most helpful
  1. David Browne - msft 3,851 Reputation points
    2021-01-23T19:22:13.267+00:00

    Now coming back, when the links come back online will SQL1 and 2 go offline as soon as they see SQL3 is active? Or will I have created a split brain scenario?

    You should plan for a split-brain scenario, as you typically can't prevent SQL1 and 2 from coming online and forming quorum before communicating with SQL3. So

    0 comments No comments

  2. CathyJi-MSFT 22,321 Reputation points Microsoft Vendor
    2021-01-25T10:29:43.593+00:00

    Hi @TRDx2 ,

    > Or will I have created a split brain scenario?

    Yes. According to MS document, the split-brain scenario is only possible when a system administrator manually performs a forced quorum operation, or in very rare circumstances, a forced failover. You manually performed a forced quorum operation, this cause split-brain.

    Suggest you read this thread Can auto failover cause split-brain? (SQL AlwaysOn).

    Best regards,
    Cathy


    If the response is helpful, please click "Accept Answer" and upvote it, thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.