在單一 GPU 上微調 Hugging Face 模型

2025-03-25

本文說明如何在單一 GPU 上使用 Hugging Face transformers 程式庫微調 Hugging Face 模型。它也包含 Databricks 特定的建議，以從 Lakehouse 載入資料，並將模型記錄至 MLflow，這可讓您在 Azure Databricks 上使用及控管模型。

Hugging Face transformers 程式庫提供Trainer公用程式和Auto Model類別，讓使用者能夠載入和微調 Transformer 模型。

這些工具只需簡單的修改，便可用於下列工作：

載入要微調的模型。
構建 Hugging Face Transformers Trainer 工具的配置。
在單一 GPU 上執行訓練。

請參閱什麼是 Hugging Face Transformers？

需求

驅動程式上具有一個 GPU 的單一節點叢集。
Databricks Runtime 13.0 ML 和更新版本的 GPU 版本。
- 此微調範例需要 Databricks Runtime 13.0 ML 和更新版本中包含的 🤗 轉換器、🤗 資料集和 🤗 評估套件。
MLflow 2.3。
已準備並載入資料，以便使用轉換器微調模型。

標記化 Hugging Face 資料集

Hugging Face 的 Transformers 模型需要分詞化的輸入，而不是下載資料中的原始文字。若要確保與基本模型相容，請使用從基本模型載入的 AutoTokenizer。 Hugging Face datasets 可讓您將權杖化工具一致地套用至訓練和測試資料。

例如：

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model)
def tokenize_function(examples):
    return tokenizer(examples["text"], padding=False, truncation=True)

train_test_tokenized = train_test_dataset.map(tokenize_function, batched=True)

設定訓練組態

Hugging Face 訓練組態工具可用來設定訓練器。訓練課程需要使用者提供：

計量
基本模型
訓練組態

除了 loss 計算的預設 Trainer 計量之外，您還可以設定評估計量。以下範例示範如何新增 accuracy 作為計量：

import numpy as np
import evaluate
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

使用 NLP 的自動模型類別，為您的工作載入適當的模型。

針對文字分類，使用 AutoModelForSequenceClassification 來載入文字分類基本模型。建立模型時，提供在資料集準備期間建立的類別數目和標籤對應。

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
        base_model,
        num_labels=len(label2id),
        label2id=label2id,
        id2label=id2label
        )

接下來，建立訓練組態。 TrainingArguments 類別可讓您指定輸出目錄、評估策略、學習速率和其他參數。

from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir=training_output_dir, evaluation_strategy="epoch")

在訓練和評估資料集中，使用資料整理器來批量處理輸入。 DataCollatorWithPadding 可針對文字分類提供良好的基準效能。

from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer)

建構所有這些參數后，您現在可以建立 Trainer。

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_test_dataset["train"],
    eval_dataset=train_test_dataset["test"],
    compute_metrics=compute_metrics,
    data_collator=data_collator,
)

訓練並記錄至 MLflow

Hugging Face 與 MLflow 介面良好，並在模型訓練期間使用 MLflowCallback 自動記錄計量。不過，您必須自行記錄訓練的模型資訊。

在 MLflow 執行中裝合訓練。這會從分詞器和訓練的模型建立Transformers管線，並將它寫入本機磁碟。最後，使用 mlflow.transformers.log_model 將模型記錄至 MLflow。

from transformers import pipeline

with mlflow.start_run() as run:
  trainer.train()
  trainer.save_model(model_output_dir)
  pipe = pipeline("text-classification", model=AutoModelForSequenceClassification.from_pretrained(model_output_dir), batch_size=1, tokenizer=tokenizer)
  model_info = mlflow.transformers.log_model(
        transformers_model=pipe,
        artifact_path="classification",
        input_example="Hi there!",
    )

如果您不需要建立管線，您可以將用於訓練的組件放入字典中。

model_info = mlflow.transformers.log_model(
  transformers_model={"model": trainer.model, "tokenizer": tokenizer},
  task="text-classification",
  artifact_path="text_classifier",
  input_example=["MLflow is great!", "MLflow on Databricks is awesome!"],
)

載入模型以進行推斷

當您的模型已記錄並準備就緒時，載入模型進行推斷的程序與載入由 MLflow 包裝的預先訓練模型相同。

logged_model = "runs:/{run_id}/{model_artifact_path}".format(run_id=run.info.run_id, model_artifact_path=model_artifact_path)

# Load model as a Spark UDF. Override result_type if the model does not return double values.
loaded_model_udf = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model, result_type='string')

test = test.select(test.text, test.label, loaded_model_udf(test.text).alias("prediction"))
display(test)

如需詳細資訊，請參閱使用馬賽克 AI 模型服務部署模型。

常見 CUDA 錯誤疑難排解

本節說明常見的 CUDA 錯誤，並指導如何解決這些錯誤。

OutOfMemoryError：CUDA 記憶體不足

訓練大型模型時，您可能會遇到的一個常見錯誤是 CUDA 記憶體不足錯誤。

範例：

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 666.34 MiB already allocated; 17.75 MiB free; 720.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

若要解決此錯誤，請遵循下列建議：

減小用於訓練的批次大小。您可以減小 per_device_train_batch_size 中的值。
使用較低精確度的訓練。您可以在 fp16=True中設定。
在 TrainingArguments 中使用 gradient_accumulation_steps，以有效增加整體批次大小。
使用 8 位 Adam 最佳化工具。
在訓練之前清理 GPU 記憶體。有時候，GPU 記憶體可能會被一些未使用的程式碼佔用。
```
from numba import cuda
device = cuda.get_current_device()
device.reset()
```

CUDA 核心程序錯誤

執行訓練時，您可能會收到 CUDA 核心錯誤。

範例：

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging, consider passing CUDA_LAUNCH_BLOCKING=1.

若要進行疑難排解：

嘗試在 CPU 上執行程式碼，以查看錯誤是否可重現。
另一個選項是藉由設定 CUDA_LAUNCH_BLOCKING=1取得更好的回溯。
```
import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
```

筆記型電腦：在單一 GPU 上微調文字分類

為了快速開始使用範例程式代碼，此範例筆記本會提供端對端範例，以微調文字分類的模型。本文後續各節會深入探討在 Azure Databricks 上使用 Hugging Face 進行微調的更多詳細資訊。

微調 Hugging Face 文字分類模型的筆記本

取得筆記本

其他資源

深入了解 Azure Databricks 上的 Hugging Face。

什麼是 Hugging Face Transformers？
您可以使用 Spark 上的 Hugging Face Transformers 模型來向外延展 NLP 批次應用程式，請參閱使用適用於 NLP 的 Hugging Face Transformers 進行模型推斷。

共用方式為

在單一 GPU 上微調 Hugging Face 模型

需求

標記化 Hugging Face 資料集

設定訓練組態

訓練並記錄至 MLflow

載入模型以進行推斷

常見 CUDA 錯誤疑難排解

OutOfMemoryError：CUDA 記憶體不足

CUDA 核心程序錯誤

筆記型電腦：在單一 GPU 上微調文字分類

微調 Hugging Face 文字分類模型的筆記本

其他資源

意見反應

其他資源