When doing a bubble test for our DR setup, will both sides of the AG be "live"?

Question

When doing a bubble test for our DR setup, will both sides of the AG be "live"?

Ric Jones 1

I'll try to explain this as best I can so bear with me - if anything is unclear, please let me know.

We have a live server which is a 2-node Windows cluster with SQL 2017 on it. It has 2 Availability Groups on it that replicate all of the non-static DBs over to our DR site (a 3rd, SQL 2017 machine, which is not part of the Windows cluster.

My boss has asked me a question and I don't know the answer to it. He is concerned that if, when we do a bubble test of the DR process, and fail the AGs over, that at some point, both our Live server and our DR server could be seen to be live and be receiving updates. Then, if this is the case, how do we deal with that upon fail back?

Now, my thoughts are that:

1/. We'd be doing the bubble test at an agreed time so there should be no-one using the system at the time apart from the resourced testers
2/. The application that is involved here would have all DNS changes done, as per a real DR failure, and so would all be pointing to the DR server
3/. Given the setup, we'd be following the steps from this MS doc for the failover (in the Forced Failover with data loss section) the AG on the live server would be offline

perform-a-planned-manual-failover-of-an-availability-group-sql-server

So, unless someone connected directly to the Live machine, via SSMS for example, and updated some data then I can't see how anything would see Live as being live. But he's more knowlegable than me and needs an answer.

2 answers

Your answer

Answer 1

Tom Phillips 17,771

You did not give us enough information about your configuration to give an answer. What is your witness config? Is the DR server set for autofailover?

Normally, a DR site would not have a vote or auto failover. This is for many, many reasons. So the DR server will never be "live" unless you manually set it that way. DR is NOT for temporary outages. It is for your server room burning to the ground or being destroyed in a tornado.

Answer 2

Hi @Ric Jones ,

Welcome to Microsoft Q&A!
Please see this document about Failover Modes: https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/failover-and-failover-modes-always-on-availability-groups?view=sql-server-ver16
When an automatic failover occurs, the entire procedure is
If the replica instance running the current primary replica is still running, it changes the connection status of the primary and secondary replicas to DISCONNECTED and disconnects all clients.
If any log records are still in the redo queue on the failover target replica, the secondary replica continues to perform redo to complete the roll-forward operation to the secondary database.
After completing the roll-forward operation, the secondary replica is converted to a primary replica and its database becomes the primary database. The new primary copy will roll back any uncommitted transactions as soon as possible.
The new primary replica places itself in the NOTSYNCHRONIZED state before connecting to a secondary replica to form a new conversation. The primary database is converted to the SYNCHRONIZED state as soon as a secondary replica is available to connect to the new primary replica, regardless of whether the rollback operation has completed.
When the original primary replica is up and running again after troubleshooting, it will find that one of the other availability replicas has now become the primary replica, so it converts itself to a secondary replica and its database will become the secondary database. When the new secondary database is connected to the new primary database, the secondary database starts synchronizing to catch up with the end of the primary database's logs.

Best regards,
Seeya

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Erland Sommarskog 122K Reputation points MVP Volunteer Moderator

2022-06-30T21:18:43.697+00:00

But Seeya, I think you are talking about a plain-vanilla Availability Group.

Since Ric's DR site is not part of the Windows cluster, the DR database is not pf the AG. It may be part of a Distributed Availability Group, though. Ric did not spell that out.

I have a recollection that you cannot really fail back from a DAG. And in events like Tom discussed that is not really of interest, since there is nothing to fail back to.

But obviously, if you are only doing a failover for test purposes, you want to be able to continue with the main site where you were.

In any case, I think it is a good idea to start the exercise with taking backups of the databases and close down all incoming connections.

Share via

When doing a bubble test for our DR setup, will both sides of the AG be "live"?

2 answers

Your answer