Windows Failover Clustering - SQL Server - Latency on Connection

HWilliams-2457 116 Reputation points
2022-03-30T18:02:50.397+00:00

Hello,

We've just set up a two-node Windows Failover cluster (WFSC) on Windows Server 2019 across AWS regions (Oregon and Ohio) using Sios DataKeeper to replicate the files. We installed two SQL 2017 instances on the cluster. We adjusted the heartbeat settings to be more forgiving of a multi-subnet config with 2 seconds instead of 1 on the cross-subnet settings. (We run 4 other clusters across AWS Availability Zones (AZ) with no problems, this is our first cross-region cluster.)

There were no issues during setup. The two roles can easily move between the two hosts with no hesitation or errors.

The problem we're having is that sporadically, connections to the SQL instances can take up to 5-10 seconds to connect. In a tool like SQL Server Management Studio, it will always eventually connect, but some things time out. Annoyingly, if you disconnect and reconnect, it will sometimes instantly reconnect. Other times, it will repeatedly take the 5-10 seconds. Again, sporadically.

We don't see this behavior on any of our cross-AZ clusters, only this new cross-region cluster. There's no errors or warning in the Windows Logs, or any of the Failover Clustering logs, including verbose.

Anybody have any ideas or suggestions?

Thanks.

Windows for business Windows Server Storage high availability Clustering and high availability
SQL Server Other
{count} votes

2 answers

Sort by: Most helpful
  1. Limitless Technology 39,916 Reputation points
    2022-04-06T10:50:24.337+00:00

    Hi @HWilliams-2457

    There are some brilliant help documents over at the AWS Documentation server. I'm not able to link them from here, but the steps you should take using their documentation are:

    Step 1: Gather data about the issue
    Step 2: Check the environment
    Step 3: Examine the log files
    Step 4: Check cluster and instance health
    Step 5: Check for suspended groups
    Step 6: Review configuration settings
    Step 7: Examine input data

    Every Amazon EMR cluster reports metrics to CloudWatch. You should use this tool to troubleshoot.

    I hope this answers your question.

    Thanks.

    --
    --If the reply is helpful, please Upvote and Accept as answer--


  2. Chris Smith 1 Reputation point
    2022-08-24T17:16:04.48+00:00

    Are you connecting using a listener? If using a listener you need to specify something like MultiSubnetAware =TRUE (depending on driver) in the connection string. Otherwise the client may try to connect to the node that is offline. https://learn.microsoft.com/en-us/troubleshoot/sql/availability-groups/listener-connection-times-out


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.