Share via

Not able to requeue jobs in Windows HPC 2016

Attuchirayil, Ajay 21 Reputation points
2022-12-23T00:07:47.747+00:00

Hi,

We have setup an Windows HPC cluster environment for running batch jobs in our Production and Non-Prod environments.

Setup:

HPC installation file used: HPCPack2016Update3-Full-Refresh-v6450.zip
Few details about the cluster are below:

  • Windows Server 2016
  • Single Head node configuration
  • Network Topology 5
  • 48 Compute Nodes added to the cluster
  • 5 databases setup in a remote DB server

Context:

We use the below commands initially to create an HPC jobs,

CreateCommand:

$HpcJob = New-HpcJob -JobFile $jobXmlFilePath

    [void](Submit-HpcJob -Job $HpcJob -Credential $credential)  

After this, we keep track of the jobID after its creation and whenever it changes its state to 'Cancel' or 'Fail' we try to requeue it for a total of 3 times using below command,
but each time we get the error shown at the bottom.

Requeue Command:

[void](Submit-HpcJob -id $jobId -Credential $credential)

Issue:

Sometimes the jobs fail and are supposed to be requeued, we have the code in place for that.
However, during the requeue block we are getting an error as below:

Submit-HpcJob : Cannot modify the job properties when attempting to requeue the job. For more information, see
http://go.microsoft.com/fwlink/?LinkId=182875.
At H:\HPCScripts\CommonHPCFunctions.psm1:95 char:16

  • [void](Submit-HpcJob -id $jobId -Credential $credential)
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • CategoryInfo : NotSpecified: (Microsoft.Compu...CPPSH.SubmitJob:SubmitJob) [Submit-HpcJob], SchedulerEx
    ception
  • FullyQualifiedErrorId : Microsoft.ComputeCluster.CCPPSH.SubmitJob

Note: We have not made any changes to the job properties, just using the jobID to requeue again.

If someone can guide us on this, that will be really helpful.

@Sumarigo-MSFT

Regards
Ajay

Community Center | Not monitored
{count} votes

1 answer

Sort by: Most helpful
  1. jamesdust412 0 Reputation points
    2023-05-31T05:04:57.6633333+00:00

    Hi Ajay,

    Thank you for sharing the details of your Windows HPC cluster environment. It seems that you're facing an issue when trying to requeue jobs after they fail. The error message suggests that modifying the job properties during requeue is not allowed. It might be helpful to review the link provided for more information on this error. You mentioned that you haven't made any changes to the Pak Jobs properties, so it's worth investigating further to understand the cause of the error. Hopefully, someone, like @Sumarigo-MSFT, can provide guidance to help resolve this issue. Best regards.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.