Share via

SQL Server instance on Always On cluster unresponsive

2023-10-24T19:58:13.72+00:00

We have a SQL Server 2017 Always On cluster with typical 2 x HA nodes + 1 x DR node. The cluster has multiple SQL instances. Not all the instances host any databases at the moment. All three nodes on physical hardware. Recently we had an application outage where the app could not access it's DBs. We tried to access the instance in SSMS but the instance would not expand. However, we could expand the other instances. We could only access the instance after restarting it. Our storage and network teams did not report and issues during the application outage. Can you suggest what areas to look which might explain the instance being hung? How can be monitor and alert on SQL instance and DBs not accessible? For example, could we use replica synchronization as a surrogate to alert on the instance not available? Any other internals of the SQL level or OS level to monitor? Thanks.

SQL Server | Other
SQL Server | Other

Additional SQL Server features and topics not covered by specific categories


2 answers

Sort by: Most helpful
  1. yunus emre ISIK 281 Reputation points
    2023-10-25T04:55:03.1+00:00

    In addition to Erland's suggestions, I suggest you to enable critical alerts and also notifications for these alerts. (security level alerts 17,18~25 , 825,835 etc)

    you can read this article:

    https://www.mssqltips.com/sqlservertip/3384/configuring-critical-sql-server-alerts/

    Was this answer helpful?

    0 comments No comments

  2. Erland Sommarskog 134.7K Reputation points MVP Volunteer Moderator
    2023-10-24T21:17:31.9833333+00:00

    The first place to check what might have happened is the SQL Server errorlog. Since you have restarted SQL Server, that would be ERRORLOG.1.

    When an issue like this occurs, you should look in the current errorlog. Since the instance is not responsive, you would need to open it from the file system.

    There might still be a possibility to connect to the instance in a situation like this. You can connect on the Dedicated Admin Connection (DAC) which has a reserved scheduler, which should respond even if the server has run out of schedulers. By the default, the DAC is only accessible locally, so you need to RDP into the machine. In SSMS you say ADMIN:.\INSTANCE. Beware that the DAC is only one single connection, so connect with a query window, not with Object Explorer. Also, you are likely to get an error when you connect, because the connection for intellisense will fail. Once connected you query DMVs, check activity etc, and you can kill processes you think are dispensible.

    Then again, if the errorlog has a dump about hung schedulers, you may have no choice but to restart SQL Server.

    To have a check that SQL Server is running, you could write a short Powershell script that runs from Task Scheduler every five minutes or so, and which sends a mail if connection fails.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.