Application lost connection to DB on 2017 Always On cluster

Cooper, David A. [ISS] (Contractor) 111 Reputation points
2023-10-04T00:04:11.68+00:00

We have 3-node SQL 2017 Always On cluster with 2 x HA and 1 x DR. We experienced an application outage reported around 1:00 AM possibly because the listener was not available. When we troubleshoot at around 7:00AM, the symptom is that we were not able to connect to the affected SQL instance using SSMS on primary node #1. However, we were able to connect to the instance on secondary node #2.

On node #2, we can see that one of the application databases was in synchronizing state, whereas the other databases used by this app were synchronized. I don't know if it matters, but each of the app databases is dedicated to a different client location.

We resolve the application outage by restarting SQL service on the secondary HA node #2. Then the app could connect and we could open the instance in SSMS on the primary.

In the AlwaysON_health extended event on node #1, we see some event records at 11:00 PM and the detail says:

Event: hadr_db_partner_set_sync_state

commit_policy: WaitForHarden

commit_policy_target: DoNothing

Sync_state: NOT

Then 20 minutes later:'

Event: hadr_db_partner_set_sync_state

commit_policy: WaitForHarden

commit_policy_target: WaitForHarden

Sync_state: LOG

Can anyone enlighten me on what's going on in this scenario and the root cause? Why would we not be able to open the instance in SSMS on node one when only one of the app DBs was not synchronizing? How to monitor and alert if this happens in the future?

SQL Server Other
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.