Removing 'vote' from async secondary replica in AG

SQLRocker 126 Reputation points
2024-01-09T20:55:41.6366667+00:00

we have 2 node non-FCI AG setup. so there is primary and secondary, its async with manual failover, as we don't want a failover to happen.

Off late we have been having AG failover attempts , which seem n/w related, looks like heartbeat related to me between the nodes. we also have a FSW.

Networking team has been looking but no result yet & we get constant failovers leading me to my question in the below lines.

I have relaxed the cluster timeouts, but still if there is a HB issue, primary AG will 'attempt' a failover, which means that it will goto PRIMARY_RESOLVING before coming back online after a couple of mins back on primary, this causes a outage.

What i am thinking of is - Can i take the vote off the secondary?

Currently primary, secondary & FSW - each has 1 vote.

Recent outage had this message on primary clusterlog - "Quorum witness has better epoch than local node, this node must have been on the losing side of arbitration!"

Seems to me that primary and secondary lost communication, reached heartbeat limit (tries HB evry 2 sec , threshold is 40 , so 80 secs timeout in this env) - If i understand correctly, secondary beat primary to the FSW , thus the msg above, and thus the outage.

So, My question is can i remove the vote from secondary , so only primary & FSW has the vote? What are the repercussions of it?

I will be left with even votes (2) , which is not recommended I read.... But as it is I don't want secondary server to become primary ever, so whats the harm?

All servers are on the same subnet. I had other apps before where we had DR site on a different subnet, there I know you should remove votes from DR nodes, i don't have any doubts on that, but can the same be done in this env where its all 1 subnet - 2 machines, one is meant to be primary and other not , with a FSW?

Please let me know what you think, thanks.

SQL Server | Other
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
    2024-01-10T22:59:15.87+00:00

    If you don't want a failover, why have a secondary in the first place? If you are using the secondary only for readonly purposes, maybe it would be better to have a clusterless AG?

    0 comments No comments

  2. SQLRocker 126 Reputation points
    2024-01-11T20:48:24.88+00:00

    Thanks for replying Erland. Yes, I think its messy - I will leave the votes as they are. Just some history - I had setup this env yrs ago , initially it was automatic failover but then every now and then AG will failover and cause a outage , mostly network/clustering related. Then i set it up as async manual which is how it is now for the past couple of yrs. But off late again we are getting n/w HB issues etc, I increased the SameSubnetDelay & SameSubnetThreshold values recently, but had another outage last week. So the thought of removing the vote came to me, but it seems like it can open another can of worms. I will continue to look into it, n/w team is also looking.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.