HPC Pack 2019 - Unable to connect to the head node running cluster manager

Jann Jutzi 1 Reputation point
2022-11-22T15:15:50.707+00:00

When I start HPC Cluster Manager locally I receive this error.

The connection to the scheduler service failed. detail error: System.AggregateException: One or more errors occurred. ---> Microsoft.Hpc.RetryCountExhaustException: Retry Count of RetryManager is exhausted. ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Internal Server Error).
at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
at Microsoft.Hpc.HttpClientExtension.<>c__DisplayClass5_0.<<GetHttpApiCallAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__331.MoveNext() --- End of inner exception stack trace --- at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__331.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.HttpClientExtension.<GetHttpApiCallAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.HpcFabricRestContext.<>c__DisplayClass15_11.<<GetFirstHttpApiCallResponseAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Hpc.HpcFabricRestContext.<GetFirstHttpApiCallResponseAsync>d__151.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.HpcFabricRestContext.<ResolveSingletonServicePrimaryAsync>d__11.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.ServiceResolverExtension.<ResolveSchedulerNodeAsync>d__12.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreConnectionContext.<ResolveSchedulerNodeAsync>d__55.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreServer.<ConnectWcfAsync>d__39.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreServer.<InternalConnectAsync>d__38.MoveNext()
--- End of inner exception stack trace ---
at Microsoft.Hpc.Scheduler.Store.StoreServer.<InternalConnectAsync>d__38.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreServer.<ConnectAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.SchedulerStoreSvc.<InitializeAsync>d__41.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.SchedulerStoreSvc.<RemoteConnectAsync>d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.SchedulerStore.<ConnectAsync>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.SchedulerStore.<ConnectAsync>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ComputeCluster.Admin.ConnectionManager.ConnectScheduler(Object sender)
---> (Inner Exception #0) Microsoft.Hpc.RetryCountExhaustException: Retry Count of RetryManager is exhausted. ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Internal Server Error).
at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
at Microsoft.Hpc.HttpClientExtension.<>c__DisplayClass5_0.<<GetHttpApiCallAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__331.MoveNext() --- End of inner exception stack trace --- at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__331.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.HttpClientExtension.<GetHttpApiCallAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.HpcFabricRestContext.<>c__DisplayClass15_11.<<GetFirstHttpApiCallResponseAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Hpc.HpcFabricRestContext.<GetFirstHttpApiCallResponseAsync>d__151.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.HpcFabricRestContext.<ResolveSingletonServicePrimaryAsync>d__11.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.ServiceResolverExtension.<ResolveSchedulerNodeAsync>d__12.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreConnectionContext.<ResolveSchedulerNodeAsync>d__55.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreServer.<ConnectWcfAsync>d__39.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Hpc.Scheduler.Store.StoreServer.<InternalConnectAsync>d__38.MoveNext()<---

---> (Inner Exception #1) System.Net.Sockets.SocketException (0x80004005): No connection could be made because the target machine actively refused it [fe80::dbf:3f89:6d80:d1b0%16]:5800

Server stack trace:
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.Runtime.Remoting.Channels.RemoteConnection.CreateNewSocket(EndPoint ipEndPoint)
at System.Runtime.Remoting.Channels.SocketCache.GetSocket(String machinePortAndSid, Boolean openNew)
at System.Runtime.Remoting.Channels.Tcp.TcpClientTransportSink.SendRequestWithRetry(IMessage msg, ITransportHeaders requestHeaders, Stream requestStream)
at System.Runtime.Remoting.Channels.Tcp.TcpClientTransportSink.ProcessMessage(IMessage msg, ITransportHeaders requestHeaders, Stream requestStream, ITransportHeaders& responseHeaders, Stream& responseStream)
at System.Runtime.Remoting.Channels.BinaryClientFormatterSink.SyncProcessMessage(IMessage msg)

Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Microsoft.Hpc.Scheduler.Store.ISchedulerStoreInternal.Register(String clientSource, String userName, ConnectionRole role, Version clientVersion, ConnectionToken& token, UserPrivilege& privilege, Version& serverVersion, Dictionary`2& serverProps)
at Microsoft.Hpc.Scheduler.Store.StoreServer.RegisterWithServer()
at Microsoft.Hpc.Scheduler.Store.StoreServer.RegisterEvent(String schedulerNode)
at Microsoft.Hpc.Scheduler.Store.StoreServer.ConnectWithRemoting(CancellationToken token)
at Microsoft.Hpc.Scheduler.Store.StoreServer.<InternalConnectAsync>d__38.MoveNext()<---

**This error occurs almost everytime after rebooting the device.
Until Now i haven't found a Solution except reinstalling the Software everytime..

I have noticed that I am unable to start the HPC Reporting Server too.
It apprears: Error 1053: The service did ot respond to the start or control request in a timely fashion.

Thank you in advance for any help!**

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,105 questions
Azure HPC Cache
Azure HPC Cache
An Azure service that provides file caching for high-performance computing.
23 questions
{count} votes