Hi,
We have setup an Windows HPC cluster environment for running batch jobs in our Production and Non-Prod environments.
Setup:
HPC installation file used: HPCPack2016Update3-Full-Refresh-v6450.zip
Few details about the cluster are below:
- Windows Server 2016
- Single Head node configuration
- Network Topology 5
- 48 Compute Nodes added to the cluster
- 5 databases setup in a remote DB server
Context:
We use the below commands initially to create an HPC jobs,
CreateCommand:
$HpcJob = New-HpcJob -JobFile $jobXmlFilePath
[void](Submit-HpcJob -Job $HpcJob -Credential $credential)
After this, we keep track of the jobID after its creation and whenever it changes its state to 'Cancel' or 'Fail' we try to requeue it for a total of 3 times using below command,
but each time we get the error shown at the bottom.
Requeue Command:
[void](Submit-HpcJob -id $jobId -Credential $credential)
Issue:
Sometimes the jobs fail and are supposed to be requeued, we have the code in place for that.
However, during the requeue block we are getting an error as below:
Submit-HpcJob : Cannot modify the job properties when attempting to requeue the job. For more information, see
http://go.microsoft.com/fwlink/?LinkId=182875.
At H:\HPCScripts\CommonHPCFunctions.psm1:95 char:16
- [void](Submit-HpcJob -id $jobId -Credential $credential)
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- CategoryInfo : NotSpecified: (Microsoft.Compu...CPPSH.SubmitJob:SubmitJob) [Submit-HpcJob], SchedulerEx
ception
- FullyQualifiedErrorId : Microsoft.ComputeCluster.CCPPSH.SubmitJob
Note: We have not made any changes to the job properties, just using the jobID to requeue again.
If someone can guide us on this, that will be really helpful.
@Sumarigo-MSFT
Regards
Ajay