工作負載 YAML 參考

Important

AI 執行時的 CLI 目前處於測試階段。

本頁為傳遞至 air run --fileALLY 工作負載 YAML 配置的參考資料。

Note

YAML 設定的實質資訊是 CLI 內的說明。執行 air -h config 頂層視圖，以及 air -h config.<section> （例如） air -h config.environment以取得每個區段的詳細資料。

最小配置

experiment_name: my-training
environment:
  dependencies: requirements.yaml
compute:
  num_accelerators: 1
  accelerator_type: GPU_1xA10
command: echo "Hello World"

提交方式如下：

air run --file train.yaml -p profile

核心概念

核心欄位

大多數訓練配置包含五個組成部分：

experiment_name：必填。建立或附加到 MLflow 實驗中。
environment：選擇性。 Python 相依關係與基礎環境。
compute：必填。 GPU 資源（類型與計數）。
command：必填。 bash指令或多個指令用來啟動訓練。
code_source：選擇性。遠端提供的訓練程式碼路徑。

你的第一份訓練工作

experiment_name: simple-training
environment:
  dependencies: requirements.yaml
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
command: torchrun --nproc_per_node=8 $CODE_SOURCE_PATH/train.py

在此配置中：

experiment_name 建立一個名為 simple-training MLflow 的實驗（或若已有新執行則附加）。
environment 安裝來自 requirements.yaml. 的相依性。
compute 分配一個 H100 節點（8 顆 H100 GPU）。
code_source 會將資料夾 repo 上傳到節點，節點可取得 $CODE_SOURCE_PATH。
command train.py torchrun它跨越 8 顆 H100 GPU 連接。檔案的 /home/username/repo/train.py 本機位址是。

常見使用案例

新增環境變數

experiment_name: training-with-env
environment:
  dependencies: requirements.yaml
env_variables:
  BATCH_SIZE: '32'
  LEARNING_RATE: '0.001'
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    git:
      branch: main
command: torchrun --nproc_per_node=8 train.py

使用秘密（API 金鑰、權杖）

experiment_name: training-with-secrets
environment:
  dependencies: requirements.yaml
secrets:
  HF_TOKEN: 'my_scope/hf_token'
  WANDB_API_KEY: 'my_scope/wandb'
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    git:
      branch: main
command: torchrun --nproc_per_node=8 train.py

秘密使用格式 scope/key ，必須在 Databricks 秘密中設定。請參見秘密管理以了解設定。

當共享 YAML 範本時，其他使用者必須建立自己的秘密或取得該參考的秘密。

與程式碼來源的工作

該 code_source 區塊會上傳本地程式碼，讓訓練工作能執行。

root_path 是快照的本地目錄。預設情況下， air 工作樹（包括未提交的變更）會以純瀝青球（tarball）形式 as-is。
若要快照置頂的 git 版本，請加入 git: 一個帶有 a branch 或 commit的區塊。這需要 root_path 是 git 儲存庫，並啟用版本感知快照（快取、 git archive）。
對於大型倉庫，可以 include_paths 快照一個子集。

最小的例子

experiment_name: simple-training
environment:
  dependencies: requirements.yaml
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
command: python $CODE_SOURCE_PATH/train.py

在遠端機器上，程式碼被放置在 /databricks/code_source/<directory_name>，其中 <directory_name> 是的 root_path最終路徑分量。 $CODE_SOURCE_PATH 設定在那條絕對路徑上——用它來控制你的指令，而不是硬編碼位置。

Git 倉庫：以分支或提交方式釘選

對於 git 倉庫，可以新增 git: 一個區塊，透過分支或提交 SHA 釘選程式碼版本。 branch 與 commit 互斥 — 在區塊中指定恰好一個。

釘選到分支（使用該分支的本地 HEAD 代碼）：

code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    git:
      branch: main # Uses local HEAD of main (no remote fetch)
command: train.sh

釘選到提交 SHA（精確重現性）：

code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    git:
      commit: abc1234567 # Pins specific commit
command: train.sh

主要領域：

root_path （必備） — 本地路徑指向你 git 儲存庫的根目錄。
git.branch （可選）— 分支名稱。使用本地 HEAD;沒有遠端取球。與 git.commit互斥。
git.commit （可選）— 特定提交 SHA。與 git.branch互斥。
git.remote （可選）— 使用該分支的遠端 HEAD 而非本地的。設定為以 true 自動偵測遙控器，或設定為遠端名稱（例如 upstream），以便從特定遙控器擷取資料。僅在時有效。git.branch

如果你省略了 git: 區塊，工作 air 樹會打包成一個普通的瀝青球，包含任何未提交的變更——不需要額外的欄位。

非 git 目錄

你可以快照非 git 倉庫的目錄。省略該 git: 區塊——它必須 root_path 是 git 儲存庫。沒有它，就沒有版本快取;每跑一次都會上傳一個新的 tarball。

code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/my_project
command: $CODE_SOURCE_PATH/train.py

資料夾過濾 `include_paths`

對於大型 monorepos，只對特定資料夾進行快照，以減少上傳與下載時間及快照大小：

code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    include_paths:
      - research/models
      - research/common
      - research/configs
command: python $CODE_SOURCE_PATH/research/models/launch_training.py

關鍵點：

這個欄位是可選的。若省略，則預設包含整個儲存庫。
路徑必須相對於儲存庫根節點（無前導 /）。
.. 不被允許;你不能引用父目錄。

進階功能

自訂超參數

透過以下 HYPERPARAMETERS_PATH方式將結構化配置傳給你的訓練腳本：

experiment_name: parameterized-training
environment:
  dependencies: requirements.yaml
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    git:
      branch: main
command: torchrun --nproc_per_node=8 train.py
parameters:
  model:
    name: 'gpt2'
    hidden_size: 768
  training:
    batch_size: 32
    learning_rate: 0.0001

請用你的腳本閱讀它們：

import os
import yaml

with open(os.environ['HYPERPARAMETERS_PATH']) as f:
    params = yaml.safe_load(f)

learning_rate = params['training']['learning_rate']
model_name = params['model']['name']

工作可靠性

experiment_name: reliable-training
environment:
  dependencies: requirements.yaml
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo
    git:
      branch: main
command: torchrun --nproc_per_node=8 train.py
max_retries: 2
timeout_minutes: 90

若工作負載失敗，則會重試兩次。每次嘗試有90分鐘完成——牆上計時預算為90×3=270分鐘。

成本歸因

透過，將工作負載附加到現有的預算政策 usage_policy_id上。關於設定，請參見「屬性使用與無伺服器使用政策」。

experiment_name: my-training
environment:
  dependencies: requirements.yaml
compute:
  num_accelerators: 1
  accelerator_type: GPU_1xA10
command: echo "Hello World"
usage_policy_id: abcd123-25b8-3e87-9a2c-f86eb19d101c

Reference

核心欄位

Field	類型	Description	Example
`experiment_name`	字串	MLflow 的實驗名稱。	`"my-training-job"`
`environment.dependencies`	字串	`NuGet.Build.Tasks.dll` 的路徑。	`"requirements.yaml"`
`compute.num_accelerators`	int	GPU 數目。	`1`、`4`、`8`
`compute.accelerator_type`	字串	顯示卡類型。	`"GPU_1xA10"`、`"GPU_8xH100"`
`code_source`	dict	程式碼來源配置。	請參見「使用程式碼來源」。
`command`	字串	巴什下令啟動訓練。	`torchrun --nproc_per_node=8 train.py`

支援的 GPU 類型

`accelerator_type`	每個節點的 GPU 數	註釋
`GPU_1xA10`	1	單一 A10 — 適合開發和小負載。
`GPU_1xH100`	1	單杯H100。
`GPU_8xH100`	8	完整的H100節點——分散式訓練的典型配置。

關於加速器的功能與推薦使用情境，請參見硬體選項。

選用欄位

環境配置

environment:
  dependencies: requirements.yaml
env_variables:
  BATCH_SIZE: '32'
secrets:
  HF_TOKEN: 'my_scope/hf_token'

關於相依檔案格式，請參見 requirements.yaml 參考文獻。

程式碼原始碼配置

code_source:
  type: snapshot
  snapshot:
    root_path: /home/username/repo # REQUIRED — local path to repo or directory
    git: # Optional (git repos only) — pin to a branch or commit
      branch: main # Branch name; uses local HEAD unless 'remote' is set
      # commit: abc1234567 # Mutually exclusive with 'branch'
      remote: false # Optional — true to auto-detect remote HEAD, or a remote name string
    include_paths: # Optional — filter included paths
      - src/
      - configs/

場限制：

git.branch 與 git.commit 互斥 — 在區 git: 塊中指定恰好一個。
git.remote 需要 git.branch （對 git.commit則無影響）。
若省略該 git: 區塊，工作樹將以純瀝青球形式包裝，包含任何未提交的變更。

自訂參數

透過以下 HYPERPARAMETERS_PATH方式交接給工作負載：

parameters:
  model:
    name: 'gpt2'
    hidden_size: 768
  training:
    batch_size: 32

MLflow 執行名稱

mlflow_run_name: 'experiment-001-baseline'

路徑解析

工作負載 YAML 中的所有路徑都是相對於工作負載 YAML 的，除非它們是絕對路徑。

資料夾結構：

/home/username/my-project/
├── train.yaml
├── requirements.yaml
└── scripts/
    └── train.py

YAML 配置：

experiment_name: my-training
environment:
  dependencies: requirements.yaml # Relative to train.yaml
compute:
  num_accelerators: 8
  accelerator_type: GPU_8xH100
code_source:
  type: snapshot
  snapshot:
    root_path: . # Relative to train.yaml
    git:
      branch: main
command: torchrun --nproc_per_node=8 $CODE_SOURCE_PATH/scripts/train.py

意見反應

此頁面對您有幫助嗎？

Last updated on 2026-06-01

工作負載 YAML 參考

最小配置

核心概念

核心欄位

你的第一份訓練工作

常見使用案例

新增環境變數

使用秘密（API 金鑰、權杖）

與程式碼來源的工作

最小的例子

Git 倉庫：以分支或提交方式釘選

非 git 目錄

資料夾過濾 include_paths

進階功能

自訂超參數

工作可靠性

成本歸因

Reference

核心欄位

支援的 GPU 類型

選用欄位

路徑解析

意見反應

其他資源

資料夾過濾 `include_paths`