Managing the Job Queue
In Job Management, you can monitor and manage jobs that are submitted to the cluster. In the job list, each row represents a job, and the columns display job properties, job states, and metric values. The job list provides a starting point for drilling down into job details and performing actions on one or more jobs.
The order of the job queue is based on job priority level and submit time. Jobs with higher priority levels run before lower priority jobs. The job submit time determines the order within each priority level. You can help regulate the order of the job queue by:
Creating Job Templates that define the valid priority range for different types of jobs or different sets of users.
Modifying the priority level of submitted jobs to change the order of the job queue. You can specify Priority in terms of a priority band, a priority number, or a combination of the two. The numerical priority can have a value between 0 (Lowest) and 4000 (Highest).
This topic provides an overview of how you can manage and monitor cluster jobs.
Configuring job scheduling policies
Job submission policies: Job templates are your primary method for defining custom job submission policies for your cluster. A job template allows you to associate a set of default values and value constraints for job properties (such as priority level) with a particular set of users. For more information, see Job Templates.
Resource allocation policies: Job scheduler configuration determines how to allocate resources to queued jobs. When you configure the HPC Job Scheduler Service, you can set scheduling policy (such as balanced or queued mode, preemption and backfilling), error handling, and job history options. For more information, see Configure the HPC Job Scheduler Service.
Advanced policy enforcement and license-aware scheduling: You can enforce site-specific job submission policies and job activation policies (such as license-aware scheduling) by creating custom job submission filters and job activation filters. For more information, see Understanding Activation and Submission Filters.
Viewing jobs and tasks
Monitor jobs: The job list displays information about jobs in the cluster. You can filter and sort the list, and choose which job properties and metric values to display in the list. For more information, see Filter and Sort the Job List.
Drill into job details: When you click a job in the list, detailed information about that job appears in the Detail Pane. You can also view job and task results. For more information, see View a Job or Task.
Track job statistics over time: HPC Cluster Manager provides several charts and reports to track job statistics for your cluster. For more information, see Charts and Reports: HPC Cluster Manager.
Performing job and task actions
As a cluster administrator or as the job owner, you can perform the following actions:
Cancel a Job or Task: Remove a job or task from the queue and free its resources.
Force Cancel a Job or Task: Stop a job or task immediately.
Requeue a Job or Task: Put unfinished jobs or tasks back into the queue.
Modify a Job: Make changes to job properties (such as priority level) or add tasks to active jobs.
Set and Clear Excluded Nodes for Jobs: If you notice that tasks consistently fail on a particular node, you can exclude that node from one or more jobs. If you resolve issues on a node, you can clear that node from the Excluded Nodes property of any active job.
Set the Progress and Progress Message Job Properties: Provide custom progress information about a job.
Copy a Job or Task: Run a job again, either as-is, or with changes.
Save a Job or Task to a File: Export the job or task specifications to an XML description file.