AlwaysON cluster problem

Hram Admin 170 Reputation points
2025-04-14T10:04:35.68+00:00

Hello!

I'm currently investigating the issue with my AlwaysON cluster that looks like this: sometimes - just out of blue - my nodes start throwing the following (among others) error:

04

At the same time the node's Application log displays the error ~"I can't understand the state of your database - either it is Primary or Secondary".

The cluster consists of two nodes: node1 in subnet1 and node2 in subnet2, file share witness is in the subnet1. Mode = asynchronous, failover mode = manual. All users are in the subnet1.

What bothers me the most is this:

Q1) am I correct that with this cluster configuration users in subnet1 connecting to node1 (also in subnet1) should NOT experience any problems when the connection between the two subnets is broken (as the quorum in this case still can be maintained)?

I mean that this -

"if connectivity monitoring fails for 10 seconds, the failover Threshold is reached resulting in the unreachable that node being removed from cluster membership. This results in the resources being moved to another available node on the cluster." - should NOT happen because the quorum is NOT lost in my case!

https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/iaas-sql-failover-cluster-network-thresholds?source=recommendations

I'm asking it because most occurences of that issue (NOT all!) occured when the link was broken.

Q2) according to the logs it seems that the root cause of the problem is the network: node1 throws the errors stating ~Cluster network resource is down and then the error depicted above, but last time it happened when 1) no users reported any issues 2) I myself was working on the node1 via RDP and had not noticed any network-related issues... - it theoretically can be because of the problem described here but if it is really the case then it's rather strange: the failover threshold is reached but no one (including me in subnet2!) has noticed it!

Regards,
Michael

SQL Server Database Engine
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Javier Villegas 905 Reputation points MVP
    2025-04-17T22:54:21.8933333+00:00

    Hi @Hram Admin

    i believe you have aproblem with the quorum so in my case I fixes it switching to a cloud witness so even when a node is down there are the 2 votes (the available node + the cloud witness)

    Regards

    Javier


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.