sql server cluster failover issue.

Heisenberg 261 Reputation points
2021-08-27T05:30:07.163+00:00

We are having issues with our SQL Server cluster failover instance. here is what is happening.

When cluster fails (manually or automatically) and ownership goes to secondary node, i see SQL Server service in "Running" state (on secondary node), SQL Agent stays as it is in stopped state, at the same time I see SQL Server resource in WSFC shows as "online pending" (on second node), it takes a long time in this state. Then it fails and all resources fails back to primary node.

After failing back to primary node WSFC shows SQL Server resource as "online pending" when i go to services i see "SQL server" service is in running state but SQL Agent stays in stopped state. It stays in "online pending" state for a long time just like it does in first failover and then SQL Server resource in WSFC goes into "failed" state.

Then what i do is on the primary node, go to services bring SQL Server service online by starting it manually, SQL Server service comes back up in running state and then i try to bring SQL Server resource in WSFC online by right clicking on it. It shows online pending for some time and then it fails again.

Next what i do is, go to services again and bring SQL Server and SQL Agent both services online one after another, they successfully starts and goes in "running" state. Then i try to bring SQL Server cluster resource online, it comes back online alongwith "sql agent" cluster resource.

Can someone tell me what is going on, why is WSFC not able to bring SQL resources online automatically and only way to bring sql server cluster resource online is by manually starting both sql server and sql server agent services manually and then manually bring cluster resource online.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,736 questions
{count} votes

5 answers

Sort by: Most helpful
  1. Seeya Xi-MSFT 16,436 Reputation points
    2021-08-27T08:46:31+00:00

    Hi @SQLServerBro ,

    only way to bring sql server cluster resource online is by manually starting both sql server and sql server agent services manually and then manually bring cluster resource online.

    No. This is abnormal. Both sql server and sql server agent services are cluster resources. Both sql server and sql server agent services are cluster resources. When the failover is completed, the resource will be online on the active node.

    Please see: https://learn.microsoft.com/en-us/troubleshoot/sql/failover-clusters/cluster-resource-goes-failed-state
    If your problem is not resolved, can you post some error messages like the symptoms in this document?

    Best regards,
    Seeya


    If the response is helpful, please click "Accept Answer" and upvote it, as this could help other community members looking for similar queries.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Heisenberg 261 Reputation points
    2021-08-30T16:23:59.533+00:00

    hello,
    Registry values are already set, pls see below screenshot.

    127556-image.png

    127618-image.png

    0 comments No comments

  3. Seeya Xi-MSFT 16,436 Reputation points
    2021-08-31T07:25:06.477+00:00

    Hi @SQLServerBro,

    Are you setting the value in the parameter of the corresponding cluster resource?

    If you notice that a SQL Server Agent cluster resource cannot be brought online, you must create the same set of resource-specific keys that correspond to the SQL Server Agent cluster resource.

    For more information, please see this doc:
    https://learn.microsoft.com/en-US/troubleshoot/sql/failover-clusters/manually-re-create-resource-specific-registry-keys

    Best regards,
    Seeya

    0 comments No comments

  4. Heisenberg 261 Reputation points
    2021-08-31T20:40:53.053+00:00

    hello Seeya, im not setting any parameter value explicitly, this is how it was when configures the cluster. If you see first screen shot that will show you parent tree value with name as "SQL Server" beneath that i has subfolder with name "parameter" under which you will see values like "instancename" virtual server name and so on.


  5. Ben Miller (DBAduck) 951 Reputation points
    2021-10-01T12:13:23.773+00:00

    Your first line of defense is to find the SQL Server Errorlogs and see what is going on in them. If SQL does not come online in the cluster (you should really never start or stop the SQL Services in Services when it is clustered) then the Errorlogs could give you a good indication of what SQL is doing behind the scenes and if there are failures in the logs, then you can resolve them first.

    0 comments No comments