Microsoft HPC Pack 2016 HPC Management Service does not start

Dmitriy Borovkov 1 Reputation point
2021-04-16T12:37:21.137+00:00

Hi there!
On our HPC cluster headnode doesn't start HPC Management Service with exception from it's logs:

e,04/06/2021 21:23:57.157, SrcFile="HpcManagement" SrcFunc="" SrcLine="0" Pid="7316" Tid="2740" TS="0x01d72b2b272d1285" String1="[HpcManagement] Exception:.System.ArgumentException: macAddress.. at Microsoft.ComputeCluster.Management.MacIpPair..ctor(String macAddress, String[] ipAddresses).. at Microsoft.ComputeCluster.Management.MachineIdentifier.AddIPPair(String macAddress, String[] IpAddresses).. at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.get_FullIdentifier().. at Microsoft.ComputeCluster.Management.HpcClusterManager.UpdateGpuNodesWithGroup(IdentifiableInstance cluster, ModelQuery query).. at Microsoft.ComputeCluster.Management.HpcClusterManager.PopulateComputeNodeList().. at Microsoft.ComputeCluster.Management.HpcClusterManager.<Initialize>d__25.MoveNext()"
e,04/06/2021 21:23:57.172, SrcFile="HpcManagement" SrcFunc="" SrcLine="0" Pid="7316" Tid="3432" TS="0x01d72b2b272f7514" String1="[HpcManagement] HPC Management service fails to start: System.AggregateException: One or more errors occurred. ---> System.ArgumentException: macAddress.. at Microsoft.ComputeCluster.Management.MacIpPair..ctor(String macAddress, String[] ipAddresses).. at Microsoft.ComputeCluster.Management.MachineIdentifier.AddIPPair(String macAddress, String[] IpAddresses).. at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.get_FullIdentifier().. at Microsoft.ComputeCluster.Management.HpcClusterManager.UpdateGpuNodesWithGroup(IdentifiableInstance cluster, ModelQuery query).. at Microsoft.ComputeCluster.Management.HpcClusterManager.PopulateComputeNodeList().. at Microsoft.ComputeCluster.Management.HpcClusterManager.<Initialize>d__25.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementHeadNodeService.<StartService>d__7.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementServiceBase.<Start>d__4.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementHeadNodeNtService.<StartService>d__4.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementServiceBase.<Start>d__4.MoveNext().. --- End of inner exception stack trace ---.. at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions).. at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken).. at System.Threading.Tasks.Task.Wait().. at Microsoft.ComputeCluster.Management.ManagementWinService.OnStart(String[] args)..---> (Inner Exception #0) System.ArgumentException: macAddress.. at Microsoft.ComputeCluster.Management.MacIpPair..ctor(String macAddress, String[] ipAddresses).. at Microsoft.ComputeCluster.Management.MachineIdentifier.AddIPPair(String macAddress, String[] IpAddresses).. at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.get_FullIdentifier().. at Microsoft.ComputeCluster.Management.HpcClusterManager.UpdateGpuNodesWithGroup(IdentifiableInstance cluster, ModelQuery query).. at Microsoft.ComputeCluster.Management.HpcClusterManager.PopulateComputeNodeList().. at Microsoft.ComputeCluster.Management.HpcClusterManager.<Initialize>d__25.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementHeadNodeService.<StartService>d__7.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementServiceBase.<Start>d__4.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerN"

It start from one reboot of headnode with updates (kb5000803 if I correct remember).
Now I've remove that update, but it didn't resolve issue

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
4,582 questions
{count} votes