Define Excluded Nodes for a Job

In HPC Pack, if you notice that tasks in your job consistently fail on a particular node, you can exclude that node from your job by adding it to the Excluded Nodes job property. When you specify nodes in the Excluded Nodes:

  • Tasks in the job that are running on a node that has been added to Excluded Nodes are canceled and marked as Failed (with the exception of Node Release tasks).

  • Node Release tasks run on the excluded node before the node is released.

  • No tasks in the job are started on nodes that are listed in Excluded Nodes.

  • If additions to the Excluded Nodes list cause the job to drop below its minimum resource requirements, the job is canceled and requeued.

For any active job that you own, you can add or remove nodes in the Excluded Nodes jobs property, or clear the list. The following lists the commands to modify and view the Excluded Nodes list using HPC PowerShell or a command prompt.

In HPC PowerShell, use the following cmdlets:

  • Set-HpcJob –Id <yourJobID> /addExludedNodes <nodeName>, <nodename>

  • Set-HpcJob –Id <yourJobID> /removeExcludedNodes <nodeName>, <nodename>

  • Set-HpcJob –Id <yourJobID> /clearExcludedNodes

  • (Get-HpcJob –Id <yourJobID>).ExcludedNodes

  • Or to view all job properties, Get-HpcJob –Id <yourJobID>|fl

At a command prompt, use the following commands:

  • job modify <yourJobID> /addExludedNodes:<nodeName>,<nodename>

  • job modify <yourJobID> /removeExcludedNodes:<nodeName>,<nodename>

  • job modify <yourJobID> /clearExcludedNodes

  • job view <yourJobID> /detailed|find “excludednodes” /i

  • Or to view all job properties, job view <yourJobID> /detailed

Note

For SOA jobs, the broker node automatically updates and maintains the list of excluded nodes according to the EndPointNotFoundRetryPeriod setting (in the service configuration file). This setting specifies how long the service host should retry loading the service and how long the broker should wait for a connection. If this time elapses, the broker adds the node (service host) to the Excluded Nodes list. The service configuration also includes the maxExcludedNodes setting that specifies how many nodes can be excluded before the session fails.

See Also

Job Submission in Microsoft HPC Pack