你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

OpenPBS

2025-06-17

可以通过修改群集定义的配置部分中的“run_list”，在 CycleCloud 群集上轻松启用 OpenPBS。 PBS Professional（PBS Pro）群集有两个主要部分：“master”节点，该节点在共享文件系统上运行软件，以及装载该文件系统并运行提交的作业的“执行”节点。例如，简单的群集模板代码片段可能如下所示：

[cluster my-pbspro]

[[node master]]
    ImageName = cycle.image.centos7
    MachineType = Standard_A4 # 8 cores

    [[[configuration]]]
    run_list = role[pbspro_master_role]

[[nodearray execute]]
    ImageName = cycle.image.centos7
    MachineType = Standard_A1  # 1 core

    [[[configuration]]]
    run_list = role[pbspro_execute_role]

在 CycleCloud 中使用定义导入并启动群集，会生成一个单独的“主”节点。可以通过 cyclecloud add_node 命令将执行节点添加到群集。例如，要添加 10 个执行节点：

cyclecloud add_node my-pbspro -t execute -c 10

基于 PBS 资源的自动缩放

Cyclecloud 维护两个资源来扩展动态预配功能。这些资源是 nodearray 和 machinetype。

如果您提交作业并通过 qsub -l nodearray=highmem -- /bin/hostname 指定节点数组资源，那么 CycleCloud 会将节点添加到名为“highmem”的节点数组中。如果不存在此类节点数组，作业将保持空闲状态。

同样，如果在作业提交中指定了计算机类型资源，例如 qsub -l machinetype:Standard_L32s_v2 my-job.sh，则 CycleCloud 会在“execute”（默认）节点数组中自动缩放“Standard_L32s_v2”。如果该计算机类型在“执行”节点数组中不可用，则作业将保持空闲状态。

这些资源可以组合使用，如下所示：

qsub -l nodes=8:ppn=16:nodearray=hpc:machinetype=Standard_HB60rs my-simulation.sh

仅当在“hpc”节点数组中指定了“Standard_HB60rs”计算机时，才会自动缩放。

添加分配给节点数组的额外队列

在具有多个节点数组的群集上，通常创建单独的队列以自动将作业路由到适当的 VM 类型。在此示例中，我们假设群集模板中定义了以下“gpu”节点数组：

    [[nodearray gpu]]
    Extends = execute
    MachineType = Standard_NC24rs

        [[[configuration]]]
        pbspro.slot_type = gpu

导入群集模板并启动群集后，可以在服务器节点上运行以下命令以创建“gpu”队列：

/opt/pbs/bin/qmgr -c "create queue gpu"
/opt/pbs/bin/qmgr -c "set queue gpu queue_type = Execution"
/opt/pbs/bin/qmgr -c "set queue gpu resources_default.ungrouped = false"
/opt/pbs/bin/qmgr -c "set queue gpu resources_default.place = scatter"
/opt/pbs/bin/qmgr -c "set queue gpu resources_default.slot_type = gpu"
/opt/pbs/bin/qmgr -c "set queue gpu default_chunk.ungrouped = false"
/opt/pbs/bin/qmgr -c "set queue gpu default_chunk.slot_type = gpu"
/opt/pbs/bin/qmgr -c "set queue gpu enabled = true"
/opt/pbs/bin/qmgr -c "set queue gpu started = true"

注释

如示例中所示，队列定义将队列中的所有 VM 打包成单个 VM 规模集以支持 MPI 作业。若要定义串行作业的队列并允许多个 VM 规模集，请同时为 ungrouped = true 和 resources_default 设置 default_chunk。如果您希望计划程序将作业打包到虚拟机上，而不是采用轮流分配机制，还可以设置 resources_default.place = pack。有关 PBS 作业打包的详细信息，请参阅官方 PBS 专业 OSS 文档。

PBS Professional 配置参考

以下是可以切换为自定义功能的 PBS Professional（PBS Pro）特定配置选项：

PBS Pro 选项	DESCRIPTION
pbspro.slots	向 PBS Pro 报告的给定节点的槽数。槽数是节点可以执行的并发作业数，此值默认为给定计算机上的 CPU 数。如果不是基于 CPU 而是基于内存、GPU 等运行作业，则可以替代此值。
pbspro.slot_type	节点提供的“slot”类型的名称。默认值为“execute”。使用硬资源 `slot_type=<type>` 标记作业时，该作业仅在同一槽类型的计算机上运行。它允许你为每个节点创建不同的软件和硬件配置，并确保始终按正确的节点类型计划适当的作业。
pbspro.version	默认值：“18.1.3-0”。这是当前的默认版本，也是安装和运行的唯一选项。这是当前的默认版本和唯一选项。将来可能支持更多版本的 PBS Pro 软件。

使用 CycleCloud 连接 PBS

CycleCloud 通过名为的可安装代理管理 azpbs 群集。此代理连接到 CycleCloud 以读取群集和 VM 配置，还与 OpenPBS 集成，以有效处理作业和主机信息。所有 azpbs 配置都在 autoscale.json 文件中，通常位于 /opt/cycle/pbspro/autoscale.json。

  "password": "260D39rWX13X",
  "url": "https://cyclecloud1.contoso.com",
  "username": "cyclecloud_api_user",
  "logging": {
    "config_file": "/opt/cycle/pbspro/logging.conf"
  },
  "cluster_name": "mechanical_grid",

重要文件

每次 azpbs代理被调用时，它都会解析 PBS 配置 - 作业、队列、资源。信息在命令的 stderr 和 stdout 中提供，并提供到日志文件中，两者都处于可配置级别。所有带参数的 PBS 管理命令（qcmd）也记录到文件中。

所有这些文件都可以在安装了代理的 /opt/cycle/pbspro/ 目录中找到。

文件	位置	DESCRIPTION
自动缩放配置	autoscale.json	自动缩放、资源映射、CycleCloud 访问信息的配置
自动缩放日志	autoscale.log	包括 CycleCloud 主机管理在内的代理主线程日志记录
需求日志	需求日志 (demand.log)	资源匹配的详细日志
qcmd 跟踪日志	qcmd.log	记录代理 `qcmd` 调用
日志记录配置	logging.conf	记录掩码和文件位置的配置

定义 OpenPBS 资源

此项目允许通过 cyclecloud-pbspro （azpbs）项目将 OpenPBS 资源与 Azure VM 资源进行常规关联。此资源关系在autoscale.json中定义。使用我们随附的群集模板定义的默认资源为

{"default_resources": [
   {
      "select": {},
      "name": "ncpus",
      "value": "node.vcpu_count"
   },
   {
      "select": {},
      "name": "group_id",
      "value": "node.placement_group"
   },
   {
      "select": {},
      "name": "host",
      "value": "node.hostname"
   },
   {
      "select": {},
      "name": "mem",
      "value": "node.memory"
   },
   {
      "select": {},
      "name": "vm_size",
      "value": "node.vm_size"
   },
   {
      "select": {},
      "name": "disk",
      "value": "size::20g"
   }]
}

命名 mem 的 OpenPBS 资源等同于名为 node.memory节点属性，该属性是任何虚拟机的总内存。此配置允许 azpbs 处理资源请求，例如 -l mem=4gb 将作业资源要求的值与节点资源进行比较。

目前，磁盘大小已硬编码为 size::20g。下面是处理 VM 大小特定磁盘大小的示例

   {
      "select": {"node.vm_size": "Standard_F2"},
      "name": "disk",
      "value": "size::20g"
   },
   {
      "select": {"node.vm_size": "Standard_H44rs"},
      "name": "disk",
      "value": "size::2t"
   }

自动缩放和规模集

CycleCloud 在 OpenPBS 群集中以不同的方式处理跨越和串行作业。生成作业出现在属于同一放置组的节点上。放置组具有特定的平台含义（SinglePlacementGroup=true 的 VirtualMachineScaleSet），并且 CycleCloud 会为每个生成的节点集管理命名放置组。请使用 PBS 资源 group_id 作为此放置组的名称。

队列hpc 通过使用原生队列默认值来追加与-l place=scatter:group=group_id等效的项。

安装 CycleCloud OpenPBS 代理 `azpbs`

OpenPBS CycleCloud 群集管理服务器节点上代理的安装和配置。准备工作包括设置 PBS 资源、队列和挂钩。脚本安装也可以在 CycleCloud 外部完成。

# Prerequisite: python3, 3.6 or newer, must be installed and in the PATH
wget https://github.com/Azure/cyclecloud-pbspro/releases/download/2.0.5/cyclecloud-pbspro-pkg-2.0.5.tar.gz
tar xzf cyclecloud-pbspro-pkg-2.0.5.tar.gz
cd cyclecloud-pbspro

# Optional, but recommended. Adds relevant resources and enables strict placement
./initialize_pbs.sh

# Optional. Sets up workq as a colocated, MPI focused queue and creates htcq for non-MPI workloads.
./initialize_default_queues.sh

# Creates the azpbs autoscaler
./install.sh  --venv /opt/cycle/pbspro/venv

# Otherwise insert your username, password, url, and cluster name here.
./generate_autoscale_json.sh --install-dir /opt/cycle/pbspro \
                             --username user \
                             --password password \
                             --url https://fqdn:port \
                             --cluster-name cluster_name

azpbs validate

CycleCloud 支持在不同计划程序之间通用的标准自动停止属性集。

特征	DESCRIPTION
cyclecloud.cluster.autoscale.stop_enabled（停止自动扩展已启用）	是否在此节点上启用自动停止？ [真/假]
自动缩放集群的作业后空闲时间	节点在完成作业后处于空闲状态的时间（以秒为单位），然后再进行缩减规模。
自动扩展.集群.作业前的空闲时间	节点在完成作业前处于空闲状态的时间（以秒为单位），然后进行缩减。

注释

CycleCloud 不支持使用 Open PBS 的突发配置。

注释

尽管 Windows 是正式支持的开放 PBS 平台，但 CycleCloud 目前不支持在 Windows 上运行 Open PBS。