AlwaysON cluster problem

Question

AlwaysON cluster problem

Hram Admin 170

Hello!

I'm currently investigating the issue with my AlwaysON cluster that looks like this: sometimes - just out of blue - my nodes start throwing the following (among others) error:

At the same time the node's Application log displays the error ~"I can't understand the state of your database - either it is Primary or Secondary".

The cluster consists of two nodes: node1 in subnet1 and node2 in subnet2, file share witness is in the subnet1. Mode = asynchronous, failover mode = manual. All users are in the subnet1.

What bothers me the most is this:

Q1) am I correct that with this cluster configuration users in subnet1 connecting to node1 (also in subnet1) should NOT experience any problems when the connection between the two subnets is broken (as the quorum in this case still can be maintained)?

I mean that this -

"if connectivity monitoring fails for 10 seconds, the failover Threshold is reached resulting in the unreachable that node being removed from cluster membership. This results in the resources being moved to another available node on the cluster." - should NOT happen because the quorum is NOT lost in my case!

https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/iaas-sql-failover-cluster-network-thresholds?source=recommendations

I'm asking it because most occurences of that issue (NOT all!) occured when the link was broken.

Q2) according to the logs it seems that the root cause of the problem is the network: node1 throws the errors stating ~Cluster network resource is down and then the error depicted above, but last time it happened when 1) no users reported any issues 2) I myself was working on the node1 via RDP and had not noticed any network-related issues... - it theoretically can be because of the problem described here but if it is really the case then it's rather strange: the failover threshold is reached but no one (including me in subnet2!) has noticed it!

Regards,
Michael

1 answer

Your answer

Answer 1

Javier Villegas 905 MVP

Hi @Hram Admin

i believe you have aproblem with the quorum so in my case I fixes it switching to a cloud witness so even when a node is down there are the 2 votes (the available node + the cloud witness)

Regards

Javier

Hram Admin 170 Reputation points

2025-04-21T07:39:36.64+00:00

Hi Javier Villegas,

"i believe you have aproblem with the quorum so in my case I fixes it switching to a cloud witness so even when a node is down there are the 2 votes (the available node + the cloud witness)" - theoretically this can be, but for know my main concern is the concept itself (I'm assuming here that the witness in subnet1 is ok):

"am I correct that with this cluster configuration users in subnet1 connecting to node1 (also in subnet1) should NOT experience any problems when the connection between the two subnets is broken (as the quorum in this case still can be maintained)"?

Share via

AlwaysON cluster problem

1 answer

Your answer