OpenPBS

2025-06-15

OpenPBS can easily be enabled on a CycleCloud cluster by modifying the "run_list", in the configuration section of your cluster definition. A PBS Professional (PBS Pro) cluster has two main parts: the 'master' node, which runs the software on a shared filesystem, and the 'execute' nodes, which mount that filesystem and run the submitted jobs. For example, a simple cluster template snippet may look like:

[cluster my-pbspro]

[[node master]]
    ImageName = cycle.image.centos7
    MachineType = Standard_A4 # 8 cores

    [[[configuration]]]
    run_list = role[pbspro_master_role]

[[nodearray execute]]
    ImageName = cycle.image.centos7
    MachineType = Standard_A1  # 1 core

    [[[configuration]]]
    run_list = role[pbspro_execute_role]

Importing and starting a cluster with definition in CycleCloud yields a single 'master' node. Execute nodes can be added to the cluster via the cyclecloud add_node command. For example, to add 10 more execute nodes:

cyclecloud add_node my-pbspro -t execute -c 10

PBS Resource-based Autoscaling

Cyclecloud maintains two resources to expand the dynamic provisioning capability. These resources are nodearray and machinetype.

If you submit a job and specify a nodearray resource by qsub -l nodearray=highmem -- /bin/hostname then, CycleCloud adds nodes to the nodearray named 'highmem'. If no such nodearray exists, the job stays idle.

Similarly, if a machinetype resource is specified which a job submission, for example, qsub -l machinetype:Standard_L32s_v2 my-job.sh, then CycleCloud autoscales the 'Standard_L32s_v2' in the 'execute' (default) nodearray. If that machine type isn’t available in the 'execute' node array, the job stays idle.

These resources can be used in combination as:

qsub -l nodes=8:ppn=16:nodearray=hpc:machinetype=Standard_HB60rs my-simulation.sh

Which autoscales only if the 'Standard_HB60rs' machines are specified in the 'hpc' node array.

Adding extra queues assigned to nodearrays

On clusters with multiple nodearrays, it's common to create separate queues to automatically route jobs to the appropriate VM type. In this example, we assume the following "gpu" nodearray is defined in your cluster template:

    [[nodearray gpu]]
    Extends = execute
    MachineType = Standard_NC24rs

        [[[configuration]]]
        pbspro.slot_type = gpu

After importing the cluster template and starting the cluster, the following commands can be ran on the server node to create the "gpu" queue:

/opt/pbs/bin/qmgr -c "create queue gpu"
/opt/pbs/bin/qmgr -c "set queue gpu queue_type = Execution"
/opt/pbs/bin/qmgr -c "set queue gpu resources_default.ungrouped = false"
/opt/pbs/bin/qmgr -c "set queue gpu resources_default.place = scatter"
/opt/pbs/bin/qmgr -c "set queue gpu resources_default.slot_type = gpu"
/opt/pbs/bin/qmgr -c "set queue gpu default_chunk.ungrouped = false"
/opt/pbs/bin/qmgr -c "set queue gpu default_chunk.slot_type = gpu"
/opt/pbs/bin/qmgr -c "set queue gpu enabled = true"
/opt/pbs/bin/qmgr -c "set queue gpu started = true"

Note

As shown in the example, queue definition packs all VMs in the queue into a single VM scale set to support MPI jobs. To define the queue for serial jobs and allow multiple VM Scalesets, set ungrouped = true for both resources_default and default_chunk. You can also set resources_default.place = pack if you want the scheduler to pack jobs onto VMs instead of round-robin allocation of jobs. For more information on PBS job packing, see the official PBS Professional OSS documentation.

PBS Professional Configuration Reference

The following are the PBS Professional(PBS Pro) specific configuration options you can toggle to customize functionality:

PBS Pro Options	Description
pbspro.slots	The number of slots for a given node to report to PBS Pro. The number of slots is the number of concurrent jobs a node can execute, this value defaults to the number of CPUs on a given machine. You can override this value in cases where you don't run jobs based on CPU but on memory, GPUs, etc.
pbspro.slot_type	The name of type of 'slot' a node provides. The default is 'execute'. When a job is tagged with the hard resource `slot_type=<type>`, that job runs only on the machine of the same slot type. It allows you to create a different software and hardware configurations per node and ensure an appropriate job is always scheduled on the correct type of node.
pbspro.version	Default: '18.1.3-0'. This is currently the default version and only option to install and run. This is currently the default version and only option. In the future more versions of the PBS Pro software may be supported.

Connect PBS with CycleCloud

CycleCloud manages OpenPBS clusters through an installable agent called azpbs. This agent connects to CycleCloud to read cluster and VM configurations and also integrates with OpenPBS to effectively process the job and host information. All azpbs configurations are found in the autoscale.json file, normally /opt/cycle/pbspro/autoscale.json.

  "password": "260D39rWX13X",
  "url": "https://cyclecloud1.contoso.com",
  "username": "cyclecloud_api_user",
  "logging": {
    "config_file": "/opt/cycle/pbspro/logging.conf"
  },
  "cluster_name": "mechanical_grid",

Important Files

The azpbs agent parses the PBS configuration each time it's called - jobs, queues, resources. Information is provided in the stderr and stdout of the command and to a log file, both at configurable levels. All PBS management commands (qcmd) with arguments are logged to file as well.

All these files can be found in the /opt/cycle/pbspro/ directory where the agent is installed.

File	Location	Description
Autoscale Config	autoscale.json	Configuration for Autoscale, Resource Map, CycleCloud access information
Autoscale Log	autoscale.log	Agent main thread logging including CycleCloud host management
Demand Log	demand.log	Detailed log for resource matching
qcmd Trace Log	qcmd.log	Logging the agent `qcmd` calls
Logging Config	logging.conf	Configurations for logging masks and file locations

Defining OpenPBS Resources

This project allows general association of OpenPBS resources with Azure VM resources via the cyclecloud-pbspro (azpbs) project. This resource relationship defined in autoscale.json. The default resources defined with the cluster template we ship with are

{"default_resources": [
   {
      "select": {},
      "name": "ncpus",
      "value": "node.vcpu_count"
   },
   {
      "select": {},
      "name": "group_id",
      "value": "node.placement_group"
   },
   {
      "select": {},
      "name": "host",
      "value": "node.hostname"
   },
   {
      "select": {},
      "name": "mem",
      "value": "node.memory"
   },
   {
      "select": {},
      "name": "vm_size",
      "value": "node.vm_size"
   },
   {
      "select": {},
      "name": "disk",
      "value": "size::20g"
   }]
}

The OpenPBS resource named mem is equated to a node attribute named node.memory, which is the total memory of any virtual machine. This configuration allows azpbs to process a resource request such as -l mem=4gb by comparing the value of the job resource requirements to node resources.

Currently, disk size is hardcoded to size::20g. Here's an example of handling VM Size specific disk size

   {
      "select": {"node.vm_size": "Standard_F2"},
      "name": "disk",
      "value": "size::20g"
   },
   {
      "select": {"node.vm_size": "Standard_H44rs"},
      "name": "disk",
      "value": "size::2t"
   }

Autoscale and Scalesets

CycleCloud treats spanning and serial jobs differently in OpenPBS clusters. Spanning jobs land on nodes that are part of the same placement group. The placement group has a particular platform meaning VirtualMachineScaleSet with SinglePlacementGroup=true) and CycleCloud manages a named placement group for each spanned node set. Use the PBS resource group_id for this placement group name.

The hpc queue appends the equivalent of -l place=scatter:group=group_id by using native queue defaults.

Installing the CycleCloud OpenPBS Agent `azpbs`

The OpenPBS CycleCloud cluster manages the installation and configuration of the agent on the server node. The preparation includes setting PBS resources, queues, and hooks. A scripted install can be done outside of CycleCloud as well.

# Prerequisite: python3, 3.6 or newer, must be installed and in the PATH
wget https://github.com/Azure/cyclecloud-pbspro/releases/download/2.0.5/cyclecloud-pbspro-pkg-2.0.5.tar.gz
tar xzf cyclecloud-pbspro-pkg-2.0.5.tar.gz
cd cyclecloud-pbspro

# Optional, but recommended. Adds relevant resources and enables strict placement
./initialize_pbs.sh

# Optional. Sets up workq as a colocated, MPI focused queue and creates htcq for non-MPI workloads.
./initialize_default_queues.sh

# Creates the azpbs autoscaler
./install.sh  --venv /opt/cycle/pbspro/venv

# Otherwise insert your username, password, url, and cluster name here.
./generate_autoscale_json.sh --install-dir /opt/cycle/pbspro \
                             --username user \
                             --password password \
                             --url https://fqdn:port \
                             --cluster-name cluster_name

azpbs validate

CycleCloud supports a standard set of autostop attributes across schedulers:

Attribute	Description
cyclecloud.cluster.autoscale.stop_enabled	Is autostop enabled on this node? [true/false]
cyclecloud.cluster.autoscale.idle_time_after_jobs	The amount of time (in seconds) for a node to sit idle after completing jobs before it is scaled down.
cyclecloud.cluster.autoscale.idle_time_before_jobs	The amount of time (in seconds) for a node to sit idle before completing jobs before it is scaled down.

Note

CycleCloud doesn't support the bursting configuration with Open PBS.

Note

Even though Windows is an officially supported Open PBS platform, CycleCloud doesn't support running Open PBS on Windows at this time.

Share via

OpenPBS

PBS Resource-based Autoscaling

Adding extra queues assigned to nodearrays

PBS Professional Configuration Reference

Connect PBS with CycleCloud

Important Files

Defining OpenPBS Resources

Autoscale and Scalesets

Installing the CycleCloud OpenPBS Agent azpbs

Feedback

Additional resources

Installing the CycleCloud OpenPBS Agent `azpbs`