準備訓練資料以進行微調

5 分鐘

你已經找出哪種微調方法能解決你的經紀人品質挑戰。現在準備真正有效的訓練資料。優質的訓練資料決定微調是成功還是浪費資源。資料準備包含三個關鍵階段：驗證資料是否符合你所選方法的正確格式、驗證內容品質符合訓練標準，以及使用合適的工具與工作流程建立資料集。

驗證資料格式

不同的微調方法需要不同的資料結構。在訓練前，請確認你的資料是否完全符合所選方法的格式規範。

SFT 要求以 JSONL（JSON Lines）格式進行對話，包含系統指令、 使用者 提示及理想助理回應。

每個訓練範例中包含 messages 一個陣列，使用者與助理角色交替：

{"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries."}, {"role": "user", "content": "Is Xbox better than PlayStation?"}, {"role": "assistant", "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"}]}

對於多回合對話，且只想優化特定回應，可使用可選 weight 參數（0 = 忽略，1 = 包含）用於訓練）：

{"messages": [{"role": "system", "content": "Adventure Works Trail Guide provides helpful outdoor gear recommendations."}, {"role": "user", "content": "What's the waterproof rating on the StormShield jacket?"}, {"role": "assistant", "content": "IPX4 rated.", "weight": 0}, {"role": "user", "content": "What does that mean practically?"}, {"role": "assistant", "content": "The StormShield jacket has an IPX4 rating with 3000mm hydrostatic head, which means it handles light to moderate rain well. Perfect for day hikes in variable weather, though you might want additional protection for extended downpours.", "weight": 1}]}

格式驗證檢查清單：

每一行都是有效的 JSON（不是整個檔案）
每個範例都有 messages 陣列
系統訊息（可選但建議）首先出現
角色在user與assistant之間交替
助理回應展現出預期的行為
可選 weight 參數為 0 或 1（省略則預設為 1）

DPO 要求以 JSONL 格式設定偏好對，並包含三個頂層欄位： input （包含系統訊息與初始使用者訊息）、 preferred_output （較佳回應）及 non_preferred_output （較差回應）。

每個訓練範例將提示（input）與兩個替代回應（preferred_outputnon_preferred_output和）分開：

{  
  "input": {  
    "messages": [{"role": "system", "content": "You are an Adventure Works outdoor gear advisor. Provide helpful safety recommendations that balance caution with encouragement for outdoor activities."}, {"role": "user", "content": "I'm planning my first winter mountain hike in Scotland. What safety gear should I bring?"}]
  },  
  "preferred_output": [{"role": "assistant", "content": "Winter mountain hiking in Scotland requires serious preparation, and it's great that you're thinking about safety! Here's what you'll need: navigation tools (map, compass, GPS), insulated and waterproof layers, emergency shelter, headlamp with spare batteries, first aid kit, and extra food and water. Given this is your first winter mountain hike, I'd also strongly recommend: going with an experienced group, checking weather forecasts carefully, and starting with a shorter route to build experience. The Scottish mountains demand respect, but with proper preparation, they offer incredible experiences."}],  
  "non_preferred_output": [{"role": "assistant", "content": "Winter mountaineering is extremely dangerous for beginners. You need avalanche transceivers, ice axes, crampons, emergency shelter, PLB, first aid, backup navigation. Do not attempt this without professional training and experienced guides."}]
}

格式驗證檢查清單：

每一行都是有效的 JSON（不是整個檔案）
每個例子都有 input、 preferred_output和 non_preferred_output 域
input 包含 messages 一個陣列，包含系統訊息（可選）與使用者訊息
preferred_output 包含至少一則輔助訊息，展示偏好行為
non_preferred_output 至少包含一則示範非偏好行為的助理訊息
偏好與非偏好的差異反映你正在優化的具體品質（音色、安全平衡、風格）

RFT 需要 JSONL 格式的提示，並有一個獨立的評分功能來評分模型回應。模型透過強化學習產生回應，並從評分者獲得獎勵分數。訓練資料可以包含可選欄位（如 ideal_itinerary），評分者可存取以進行評分。

每個訓練範例都包含 messages 一個以使用者訊息結尾的陣列，以及評分者用來評分的選用欄位：

{
  "messages": [
    {
      "role": "system",
      "content": "You are a trip planning assistant for Adventure Works. Create multi-day hiking itineraries that balance fitness level, experience, budget, weather conditions, and safety."
    },
    {
      "role": "user",
      "content": "Plan a 3-day coastal hiking trip in Wales for April. I have moderate fitness, limited camping experience, and a £500 budget."
    }
  ],
  "ideal_itinerary": "Day 1: Marloes to Broad Haven (8 miles, +800ft) - Stay at Broad Haven campground (facilities for beginners, £12/night). Day 2: Broad Haven to Little Haven (5 miles, +400ft) - Stay at Little Haven campground (£14/night). Day 3: Little Haven to St Brides (6 miles, +600ft). Total: 19 miles over 3 days (conservative for moderate fitness), established campsites (suitable for limited experience), total camping cost £26 + food/transport under £500. April weather: pack waterproofs, expect 10-15°C."
}

評分功能 （另定義，評分訓練期間的模型反應）：

{
  "type": "text_similarity",
  "name": "itinerary_similarity_grader",
  "input": "{{ sample.output_text }}",
  "reference": "{{ item.ideal_itinerary }}",
  "evaluation_metric": "fuzzy_match"
}

此文本相似度評分器會利用模糊匹配（RapidFuzz 演算法）將模型生成的行程與 ideal_itinerary 田野進行比較，並根據相似度回傳分數。

格式驗證檢查清單：

訓練資料遵循 JSONL 格式，並搭配 messages 陣列
最終訊息必須具有 user 角色 (模型會在訓練期間產生回應)
必須同時提供訓練與驗證資料集
可選欄位（如 ideal_itinerary 或 solution）可供評分者存取
評分器函數必須另行定義以評分模型回應（文字比較、基於模型、自訂 Python 程式碼或多重評分器）

驗證資料品質

即使格式正確，若內容品質不足，仍可能失敗。確認格式合規後，驗證你的問題、回答及實地內容是否符合這些品質標準。

圖示顯示訓練資料的五項品質原則：一致性、準確性、多樣性、清晰度與代表性。

高品質的訓練資料無論如何微調，都具有可預測的特性。請根據以下五項原則評估您的資料集：

一致性：每個例子都能展現你想要強化的具體行為。 Adventure Works 會審查每項裝備規格回應，確保其符合標準格式：技術規格、價格、供應情況及相關建議均以一致順序呈現。訓練資料中的混合格式會產生輸出的混合格式。

準確性：範例包含事實正確的資訊及適當的建議。一個不準確的訓練範例（例如建議冬季裝備適合夏季）可能會破壞模型在相關情境下的行為。在包含範例前，請先驗證網域的正確性。

多樣性：訓練資料涵蓋模型可能遇到的各種查詢變化、邊緣案例與回應情境。 Adventure Works 確保其安全資料集涵蓋不同經驗層級、不同季節、多元活動類型及多元地理環境（而非僅限於某一地區的夏季健行）。

清晰度：每個範例都明確展示一種期望行為。避免出現「正確」回應需要主觀解讀，或存在多種有效方法的例子。除非您使用明確指出您偏好哪種方法的偏好配對 (DPO)。

代表性：訓練資料分布符合真實世界的使用模式。如果 Adventure Works 的查詢中有 40% 是關於防水評分，但防水範例只佔訓練資料的 5%，那麼模型在這種常見的使用情境中表現不佳。

小提示

質量勝於數量：100 個高品質且多元的範例優於 500 個平庸範例。從你最好的 50 到 100 個範例開始，微調、評估，然後決定是否要擴充數量或根據失敗分析改進現有範例。

建立你的資料集

遵循此系統化工作流程，建立符合格式要求與品質標準的訓練資料。

選擇你的資料蒐集策略：當你有紀錄的互動時，真實資料發揮作用。合成生成適用於範例稀少或包含敏感資訊的情況。混合型結合了兩者，例如你可以用真實數據處理常見情境，合成數據處理邊緣情況。
準備您的來源資料：若為合成資料，請準備乾淨的 PDF/Markdown 以產生問答內容，或準備 OpenAPI 規格以產生工具使用內容。要取得真實資料：收集聊天記錄、客服工單或有紀錄的互動紀錄。移除不必要的格式與行銷內容。
產生或策劃訓練範例：使用 Foundry 的合成資料產生器（簡單問答或工具使用）來建立 JSONL 範例，或手動將真實互動結構化為 JSONL 格式。
驗證格式與品質：確認格式合規性及品質原則。請檢視合成範例以防錯誤資訊。從 50 到 100 個範例開始，徹底驗證，然後逐步擴展。
監控並審核模型效能：在微調後，評估模型基準與驗證數據的指標。如果表現不佳，分析失敗案例，判斷是否需要更多、更好或不同的範例。

小提示

建立格式正確且高品質的訓練資料後，你就準備好學習如何優化模型的微調。

意見反應

此頁面對您有幫助嗎？