Why ALWAYSON AG goes to 'NOT SYNCRONIZING' when db are busy

Question

Why ALWAYSON AG goes to 'NOT SYNCRONIZING' when db are busy

Jiong AN 146

Everytime my alwayson db is busy, ALL the secondary dbs goes into 'NO SYNCRONIZING' mode. and stays until the server goes quiet again.

From some times ago, this is no longer the case. it stays in 'NO SYNCRONIZING' unless i remote desktop to the db server and open 'Failover Cluster Manager', the app opened and it took about 1 mins for the cluster info, nodes to show up. Then I checked the AG again, dbs start to go back to 'SYNCRONIZED'.

I know MS had a fix about 'NO SYNCRONIZING' at

https://support.microsoft.com/en-au/help/3034679/kb3034679-fix-alwayson-availability-groups-are-reported-as-not-synchro

but I have already patched my 2016 enterprise to the latest CU2, and still having problems.

I have checked error logs, it shows timeout happened.

e.g.
A connection timeout has occurred on a previously established connection to availability replica 'server1' with id [*******]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
35206

and

Always On Availability Groups connection with primary database terminated for secondary database 'db2' on the availability replica 'server1' with Replica ID: {******}. This is an informational message only. No user action is required.
35267

....

Strange thing is: it ONLY happens when db is busy. when db is not busy, nothing happens. So the although the error seems like a network issue but in fact is not.

My Questions:

Why does db go to 'NO SYNCRONIZING' and cluster connection timeout when db is busy?
when system is quiet, it stays as 'NO SYNCRONIZING' now, to fix it, i needs to open 'FAILOVER CLUSTER MANAGER', why is it a fix. does it restart some services or rejoined node to cluster? (since I can't see anything is the system logs and cluster logs also shows nothing). If so, can I run a powerscript command to do the same as open app.

Accepted answer

4 additional answers

Your answer

Answer 1

Why does db go to 'NO SYNCRONIZING' and cluster connection timeout when db is busy?

You have already said db is busy, when db is busy, you can think that there are limited resources but for busy db, the process and
its threads need more time to wait, to recovering and restoring.

Availability database status:

Not synchronizing

If the primary database is in this state, it means that the database is not ready to synchronize its transaction log with the corresponding secondary database.

If the secondary database is in this state, the database may

(1) Log synchronization is no longer performed due to connection problems or redo failure,
(2) The log synchronization with the main database is "suspended",
(3) Due to role switching, it is in the middle of transition.

You can check as that run cmd, and then ping listener's ip -t, then failover, there you can see one timeout, then connected, means that it is in switching time.

when system is quiet, it stays as 'NO SYNCRONIZING' now, to fix it, i needs to open 'FAILOVER CLUSTER MANAGER', why is it a fix. does it restart some services or rejoined node to cluster? (since I can't see anything is the system logs and cluster logs also shows nothing). If so, can I run a powerscript command to do the same as open app.

Why do you use latest SP and CU to have a test. Check why your db is always busy, you can classify them in different dbs to sharing the load.

Note:

A connection timeout has occurred on a previously established connection to availability replica 'server1' with id [***]. Either a networking or a firewall issue exists or >the availability replica has transitioned to the resolving role.35206

Note This problem might occur only on very powerful computers and when SQL Server is very busy. For example, in one scenario, this problem occurred on a very busy system with 24 cores.

Always On Availability Groups connection with primary database terminated for secondary database 'db2' on the availability replica 'server1' with Replica ID: {**}. This is >an informational message only. No user action is required.35267

This issue occurs because of a deadlock condition between a redo thread and a switch-role thread.

If your verison is meet the fixed versions, always have these issues, I recommend you to submit your issue to the Microsoft feedback at this link https://feedback.azure.com/forums/908035-sql-server . This site will serve as a connecting point between you and Microsoft, and ultimately the large community for you and Microsoft to interact with. Your feedback enables Microsoft to offer the best software and deliver superior services, meanwhile you can learn more about and contribute to the exciting projects on Microsoft feedback.

Or

If the issue still exists after you have tested, please don't hesitate to ask professional engineers for help, and they will deal with your problem separately and confidentially: https://support.microsoft.com/en-us/assistedsupportproducts

BR,
Mia

Jiong AN 146 Reputation points

2020-10-27T09:21:39.863+00:00

Hi Mia, thank you very much for your detail analytical explanation, please see my reply below.

Answer 2

Hi Mia, thank you very much for your detail analytical explanation:

In fact my link between primary and secondary has some network delays.

The 'no syncronizing' is happened on secondary.

Both primary and secondary are VM with 32Gb memory , 16 Cores
Microsoft SQL Server 2016 (SP2-CU14) (KB4564903) - 13.0.5830.85 (X64) Enterprise (64-bit) on Windows Server 2016 Standard 10.0 <X64> (Build 14393: ) (Hypervisor)
yes, they are powerful VMs.

Why my db busy? I see a lot of 'HADR_SYNC_COMMIT' and 'HADR_GROUP_COMMIT'. this could be contributed from the network latency. when a INSERT, DELETE,UPDATE process spend a lot of time waiting in 'HADR_SYNC_COMMIT', it caused a chain reactions for others to wait.

Before I submit a ticket to MS feedback, i would like to get some info from you see if this is relating to something simple. If not, i will certainly submit it to MS feedback.

Answer 3

Hi @Jiong AN ,

Why my db busy? I see a lot of 'HADR_SYNC_COMMIT' and 'HADR_GROUP_COMMIT'. this could be contributed from the network latency. when a INSERT, DELETE,UPDATE process spend a lot of time waiting in 'HADR_SYNC_COMMIT', it caused a chain reactions for others to wait.

Yes. You are right.
So what you need to do is to upgrade your network.
Similar case: hadrsyncommit-wait-types

And more information: sys-dm-os-wait-stats-transact-sql

HADR_GROUP_COMMIT: Transaction commit processing is waiting to allow a group commit so that multiple commit log records can be put into a single log block. This wait is an expected condition that optimizes the log I/O, capture, and send operations.

HADR_SYNC_COMMIT: Waiting for transaction commit processing for the synchronized secondary databases to harden the log. This wait is also reflected by the Transaction Delay performance counter. This wait type is expected for synchronized availability groups and indicates the time to send, write, and acknowledge log to the secondary databases.

BR,
Mia

If the answer is helpful, please click "Accept Answer" and upvote it.

Jiong AN 146 Reputation points

2020-10-27T11:31:50.51+00:00

Hi Mia, thanks again.

Although there are evident shows the network delay might play a big part of if. I will still submit a ticket to MS professional supports to see if by any chance will be also an issue that you mentioned above. "This issue occurs because of a deadlock condition between a redo thread and a switch-role thread." given that my error logs and VM system resource is matching to the symptoms that you described. in that case i might needs MS to provide me some kind of patch for my sql server version. Thanks again for your inputs.

Answer 4

m 4,276

Hi @Jiong AN ,

And if there is one big transaction on your primary is running, you can consider optimize the t_sql or execute them in batch.

BR,
Mia

If the answer is helpful, please click "Accept Answer" and upvote it.

Answer 5

Jiong AN 146

hi Mia
Thanks again, the transaction was relatively simple INSERT or DELETE, UPDATEs, i have also checked the table rows also small, few hundred rows. Its just a 'HADR_SYNC_COMMIT' that takes time, I have also run query to show total number of waits vs wait type. The HARD*_COMMIT type waits is about more than 50% of total wait time. At times, i took db out of the AG, it just takes queries a lot faster just by itself without Alwayson.

Share via

Why ALWAYSON AG goes to 'NOT SYNCRONIZING' when db are busy

4 additional answers

Your answer