SQL Server Instance Service could not start on some specific nodes

Christian Mühleder 0 Reputation points
2023-09-28T06:02:25.03+00:00

I tried to remove multiple SQL Server Failover Cluster Instances on a single Failover Cluster Node. Therefore i wrote a powershell script to run SQL Server's setup.exe via Invoke-Command cmdlet from a remote computer, because you have to restart the node after each instance removal.

But i think the problem was that i ran into powershell's "second hop" problem. The Setup exe wasn't able to retrieve information about other nodes instances, as you can see it below from the SQL Server install log:

...

Slp: Running Action: RunRemoteDiscoveryAction

Slp: Running discovery on local machine

Slp: Running discovery on remote machine: SQLNODE1

Slp: Running discovery on remote machine: SQLNODE2

Slp: Running discovery on remote machine: SQLNODE3

Slp: Discovery on local machine is complete

Slp: Discovery on SQLNODE1 failed due to exception

Slp: Microsoft.SqlServer.Management.Sdk.Sfc.EnumeratorException: Failed to retrieve data for this request. ---> Microsoft.SqlServer.Configuration.Sco.ScoException: Attempted to perform an unauthorized operation. ---> System.UnauthorizedAccessException: Attempted to perform an unauthorized operation.

...

Slp: Discovery failed due to access denied, setup will continue running.

...

But the setup does not stop here, it proceeds and decides that if i can't see the instances of the other nodes than this must be the last node with this instance, but this was wrong. The setup sets the "Last Node Flag" (FLastNode) to True and removes all cluster ressources and their dependencies of this instance. The Instance was down now and can't be started on any other node.

After we added the ressources and their dependencies manually in the "Failover Cluster Manager" the instance could run successfully on 2 of our 4 node cluster, but we can't get the instance running on any of the remaining 2 nodes.

We already tried to remove and add the instance via the SQL Server Server Setup, but it won't work. Everytime we do a failover to one of the two nodes the SQL Server Service can't start.

The Windows Event Log says: "The cluster service failed to bring clustered role 'SQL Server (MyInstance)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role."

As far as i can see the network name and disks work fine, but the SQL Service fails. Where can i get more information for the reason why the instance can't start?

Do anyone have a similar case or a good hint for fixing this issue?

Also the SQL Server errorlog does not provide any additional information.

P.S. We already thought about it to remove the instance completly on every node and start with a clean re-add, but this is a production instance and at the moment there isn't a possibility to do that.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,629 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.