Hi Team,
Environment: Microsoft HPC PACK 2019 - Head node and Compute node
I am trying to submit the MPI job on master and compute nodes. For the first time I am able to submit the mpi job and get output.
However, the next run i am getting below error
Error from node: HEADNODE:System.ServiceModel.FaultException`1[Microsoft.Hpc.ExceptionWrapper]: The security database on the server does not have a computer account for this workstation trust relationshipException of type 'Microsoft.Hpc.Activation.NodeManagerException' was thrown. (Fault Detail is equal to Microsoft.Hpc.ExceptionWrapper).
finally i check the event logs:
An unexpected exception occurred. For more information about this exception, see the Details tab.
Additional data:
We can't sign you in with this credential because your domain isn't available. Make sure your device is connected to your organization's network and try again. If you previously signed in on this device with another credential, you can sign in with that credential.
Exception detail: System.Security.SecurityException: We can't sign you in with this credential because your domain isn't available. Make sure your device is connected to your organization's network and try again. If you previously signed in on this device with another credential, you can sign in with that credential.
at System.Security.Principal.WindowsIdentity.KerbS4ULogon(String upn, SafeAccessTokenHandle& safeTokenHandle)
at System.Security.Principal.WindowsIdentity..ctor(String sUserPrincipalName, String type)
at System.Security.Principal.WindowsIdentity..ctor(String sUserPrincipalName)
at Microsoft.Hpc.Diagnostics.Controller.Utilities.ImpersonateWhenDomainJoinedT
at Microsoft.Hpc.Diagnostics.Controller.Utilities.CreateJob(ISchedulerStore store, String requestedBy, StoreProperty[] jobProps)
at Microsoft.Hpc.Diagnostics.Controller.PreStepFinishedHandler.ScheduleRunWithTaskResult(DiagnosticTestRun testRun, DiagnosticTest test, StepResult result)
at Microsoft.Hpc.Diagnostics.Controller.PreStepFinishedHandler.ExecuteInternal(DiagnosticTestRun testRun)
at Microsoft.Hpc.Diagnostics.Controller.StateHandlerBase.Execute()
now both master and compute nodes are removed automatically from the domain. finally, We have rejoined the both servers to domain and tested the mpi job. Again we are getting the same problem - both the servers are removed automatically from the domain.
Please help us to resolve this issue.
Regards,
Zain