Why can't the Cluster Manager connect to the local computer as the head node of a fresh install of HPC Pack 2019 Update 1 configured with a single head node?

Nate Anderson 5 Reputation points
2023-08-30T17:07:02.8333333+00:00

I've prepared a fresh install of HPC Pack Update 1 on a Windows 2022 Server but when the installation finished, and I try to run the Cluster Manager it fails with the following error:

The connection to the management service failed. detail error: Microsoft.Hpc.RetryCountExhaustException: Retry Count of RetryManager is exhausted. ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.HttpClientExtension.<>c__DisplayClass5_0.<<GetHttpApiCallAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__33`1.MoveNext()
   --- End of inner exception stack trace ---
   at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__33`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.HttpClientExtension.<GetHttpApiCallAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.HpcFabricRestContext.<>c__DisplayClass17_1`1.<<GetFirstHttpApiCallResponseAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.HpcFabricRestContext.<GetFirstHttpApiCallResponseAsync>d__17`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.HpcFabricRestContext.<ResolveSingletonServicePrimaryAsync>d__13.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.ServiceResolverExtension.<ResolveSdmNodeAsync>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.SystemDefinitionModel.Store.SdmRemoteStore.SdmRemoteStoreConnector.<ConnectAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.SystemDefinitionModel.Store.SdmRemoteStore.<ConnectAsync>d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.SystemDefinitionModel.Store.SdmRemoteStore.CheckConnection()
   at Microsoft.SystemDefinitionModel.Store.SdmRemoteStore.<>c__DisplayClass45_0.<QueryDocuments>b__0()
   at Microsoft.SystemDefinitionModel.Store.SdmRetry.<>c__DisplayClass4_0`1.<InvokeWithRetry>b__0()
   at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__33`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.SystemDefinitionModel.Store.SdmRetry.InvokeWithRetry[T](Func`1 function, CancellationToken cancellationToken)
   at Microsoft.SystemDefinitionModel.ModelDocuments.LoadAllDocuments()
   at Microsoft.SystemDefinitionModel.DefinitionSpaceView.TryResolve(String simpleName)
   at Microsoft.SystemDefinitionModel.ModelDocuments..ctor(Model model)
   at Microsoft.SystemDefinitionModel.Model.InitializeModel()
   at Microsoft.ComputeCluster.Management.ClusterManager.ConnectCore(Model model)
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.RetryManager.<>c__DisplayClass34_0.<<InvokeWithRetryAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__33`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ComputeCluster.Management.ClusterManager.<ConnectAsync>d__90.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ComputeCluster.Management.ClusterManager.<ConnectAsyncWithModel>d__89.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ComputeCluster.Management.ClusterManager.<ConnectAsync>d__87.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ComputeCluster.Admin.ConnectionManager.<ConnectManagementService>b__10_0(Task`1 t)
Windows for business | Windows Server | User experience | Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 44,766 Reputation points
    2023-09-01T08:30:22.9633333+00:00

    Hello there,

    You need to ensure that the same service account is used throughout the nodes of the same cluster. First thing to check is that the Windows Cluster account is active and working fine.

    Make sure that the following ports are open to cluster traffic on any firewalls:

    Port 135: Remote procedure call (RPC) endpoint mapper or distributed component object model (DCOM).

    Port 135: RPC endpoint mapper over user datagram protocol (UDP).

    Port 3343: Cluster network driver.

    Port 445: Server message block (SMB).

    Port 139: NetBIOS session service.

    Hope this resolves your Query !!

    --If the reply is helpful, please Upvote and Accept it as an answer--


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.