使用 Microsoft Foundry SDK 在雲端執行評估

這很重要

本文中標示為 (預覽) 的項目目前處於公開預覽狀態。此預覽版本沒有服務等級協定，不建議將其用於生產工作負載。可能不支援特定功能，或可能已經限制功能。如需詳細資訊，請參閱 Microsoft Azure 預覽版增補使用條款。

在本文中，您將學習如何在雲端（預覽）中對測試資料集進行部署前測試的評估。

大多數情境都應使用雲端評估——尤其是在大規模測試、將評估整合進持續整合與持續交付（CI/CD）流程，或執行部署前測試時。在雲端執行評估消除了管理本地運算基礎設施的需求，並支援大規模自動化的測試工作流程。您也可以安排定期執行評估，或設定持續評估，自動評估抽樣的代理反應。

雲端評估結果會儲存在您的 Foundry 專案中。你可以在入口網站查看結果，透過 SDK 檢索結果，或在連接後將其導向 Application Insights。雲端評估支援所有 Microsoft 策劃的內建評估器以及你自己的自訂評估器。評估者在評估者目錄中以相同的專案範圍、基於角色的存取控制來管理。

小提示

完整可執行範例，請參閱 GitHub 上的 Python SDK 評估範例。

當你使用 Foundry SDK 時，它會記錄你的 Foundry 專案中的評估結果，以提升可觀察性。此功能支援所有 Microsoft 策劃的內建評估器。以及你自己的自訂評估器。你的評估人員可以位於評估器函式庫中，並擁有相同的專案範圍、角色導向存取控制。

雲端評估的運作方式

要執行雲端評估，你先建立一個包含資料結構和測試標準（評估者）的評估定義，然後建立一個評估執行。執行過程會針對你的資料運行每個評估器，並回傳可追蹤完成度的評分結果。

雲端評估支援以下情境：

情境	使用時機	資料來源類型	標的
資料集評估	評估預先計算好的回應，存於 JSONL 檔案中。	`jsonl`	—
模型目標評估	在執行時提供查詢並從模型產生回應以供評估。	`azure_ai_target_completions`	`azure_ai_model`
代理人目標評估	在執行時提供查詢並由 Foundry 代理生成回應以供評估。	`azure_ai_target_completions`	`azure_ai_agent`
代理反應評估	透過回應 ID 檢索並評估 Foundry 代理的回應。	`azure_ai_responses`	—
紅隊評估	對模型或代理人進行自動對抗性測試。	`azure_ai_red_team`	`azure_ai_model` 或 `azure_ai_agent`

大多數情境都需要輸入資料。你可以用兩種方式提供資料：

來源類型	Description
`file_id`	請依 ID 參考已上傳的資料集。
`file_content`	請在請求中內嵌提供資料。

每次評估都需要一個 data_source_config，以告訴服務應從資料中預期哪些欄位。

custom— 你用欄位名稱和類型來定義。item_schema 在使用目標時，將 include_sample_schema 設定為 true 以便讓評估者能參考產生的回應。
azure_ai_source — 結構是從服務推斷出來的。設定"scenario"為"responses"用於客服回應評估或"red_team"紅隊測試。

每個情境都需要評估者來定義你的測試標準。關於選擇評估員的指引，請參見內建評估員。

先決條件

Foundry 專案。
一個具備聊天完成功能的 GPT 模型的 Azure OpenAI 部署（例如，gpt-5-mini）。
Foundry 專案中的 Azure AI 使用者角色。
你也可以選擇使用自己的儲存帳號來進行評估。

備註

部分評估功能有區域限制。詳情請參閱支援區域。

開始

安裝 SDK 並設定你的客戶端：

pip install "azure-ai-projects>=2.0.0b1" azure-identity openai

import os
from azure.identity import DefaultAzureCredential 
from azure.ai.projects import AIProjectClient 
from openai.types.eval_create_params import DataSourceConfigCustom
from openai.types.evals.create_eval_jsonl_run_data_source_param import (
    CreateEvalJSONLRunDataSourceParam,
    SourceFileContent,
    SourceFileContentContent,
    SourceFileID,
)

# Azure AI Project endpoint
# Example: https://<account_name>.services.ai.azure.com/api/projects/<project_name>
endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]

# Model deployment name (for AI-assisted evaluators)
# Example: gpt-5-mini
model_deployment_name = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "")

# Dataset details (optional, for reusing existing datasets)
dataset_name = os.environ.get("DATASET_NAME", "")
dataset_version = os.environ.get("DATASET_VERSION", "1")

# Create the project client
project_client = AIProjectClient( 
    endpoint=endpoint, 
    credential=DefaultAzureCredential(), 
)

# Get the OpenAI client for evaluation API
client = project_client.get_openai_client()

準備輸入資料

大多數評估情境都需要輸入資料。你可以用兩種方式提供資料：

上傳資料集（建議）

上傳 JSONL 檔案，在你的 Foundry 專案中建立具版本化的資料集。資料集支援版本控制，並可在多次評估運行中重複使用。此方法用於生產測試及 CI/CD 工作流程。

準備一個 JSONL 檔案，每行包含一個 JSON 物件，包含評估者所需的欄位：

{"query": "What is machine learning?", "response": "Machine learning is a subset of AI.", "ground_truth": "Machine learning is a type of AI that learns from data."}
{"query": "Explain neural networks.", "response": "Neural networks are computing systems inspired by biological neural networks.", "ground_truth": "Neural networks are a set of algorithms modeled after the human brain."}

# Upload a local JSONL file. Skip this step if you already have a dataset registered.
data_id = project_client.datasets.upload_file(
    name=dataset_name,
    version=dataset_version,
    file_path="./evaluate_test_data.jsonl",
).id

提供線上資料

若想快速進行小型測試集的實驗，請直接在評估請求中提供資料，使用file_content。

source = SourceFileContent(
    type="file_content",
    content=[
        SourceFileContentContent(
            item={
                "query": "How can I safely de-escalate a tense situation?",
                "ground_truth": "Encourage calm communication, seek help if needed, and avoid harm.",
            }
        ),
        SourceFileContentContent(
            item={
                "query": "What is the largest city in France?",
                "ground_truth": "Paris",
            }
        ),
    ],
)

在建立運行時，將 source 作為你的資料來源設定中的 "source" 欄位傳入。接下來的劇本章節預設使用 file_id 。

資料集評估

使用 jsonl 資料來源類型，評估存於 JSONL 檔案中的預先計算回應。當你已經有模型輸出並想評估其品質時，這個情境非常有用。

小提示

開始前，先完成「開始」和「準備輸入資料」。

定義資料結構與評估器

指定與你的 JSONL 欄位相符的結構，並選擇要執行的評估器（測試標準）。用data_mapping 參數將您的輸入資料欄位與{{item.field}}語法的評估器參數連接。務必將 data_mapping 包含在每位評估者所需的輸入欄位中。你的欄位名稱必須與 JSONL 檔案中的名稱相符——例如，如果你的資料中使用 "question" 而不是 "query"，那麼在映射中應該使用 "{{item.question}}"。關於每個評估器所需的參數，請參見內建評估器。

Python（編程語言）
cURL

data_source_config = DataSourceConfigCustom(
    type="custom",
    item_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string"},
            "response": {"type": "string"},
            "ground_truth": {"type": "string"},
        },
        "required": ["query", "response", "ground_truth"],
    },
)

testing_criteria = [
    {
        "type": "azure_ai_evaluator",
        "name": "coherence",
        "evaluator_name": "builtin.coherence",
        "initialization_parameters": {
            "deployment_name": model_deployment_name
        },
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{item.response}}",
        },
    },
    {
        "type": "azure_ai_evaluator",
        "name": "violence",
        "evaluator_name": "builtin.violence",
        "initialization_parameters": {
            "deployment_name": model_deployment_name
        },
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{item.response}}",
        },
    },
    {
        "type": "azure_ai_evaluator",
        "name": "f1",
        "evaluator_name": "builtin.f1_score",
        "data_mapping": {
            "response": "{{item.response}}",
            "ground_truth": "{{item.ground_truth}}",
        },
    },
]

curl --request POST \
  --url "https://${ACCOUNT}.services.ai.azure.com/api/projects/${PROJECT}/openai/evals?api-version=v1" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "dataset-evaluation",
    "data_source_config": {
      "type": "custom",
      "item_schema": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "response": { "type": "string" },
          "ground_truth": { "type": "string" }
        },
        "required": ["query", "response", "ground_truth"]
      }
    },
    "testing_criteria": [
  }'

建立評估並執行

建立評估，然後開始對你上傳的資料集進行運算。執行會在資料集中的每一列上執行每個評估器。

Python（編程語言）
cURL

# Create the evaluation
eval_object = client.evals.create(
    name="dataset-evaluation",
    data_source_config=data_source_config,
    testing_criteria=testing_criteria,
)

# Create a run using the uploaded dataset
eval_run = client.evals.runs.create(
    eval_id=eval_object.id,
    name="dataset-run",
    data_source=CreateEvalJSONLRunDataSourceParam(
        type="jsonl",
        source=SourceFileID(
            type="file_id",
            id=data_id,
        ),
    ),
)

# Step 1: Create the evaluation
EVAL_ID=$(curl --silent --request POST \
  --url "https://${ACCOUNT}.services.ai.azure.com/api/projects/${PROJECT}/openai/evals?api-version=v1" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "dataset-evaluation",
    "data_source_config": {
      "type": "custom",
      "item_schema": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "response": { "type": "string" },
          "ground_truth": { "type": "string" }
        },
        "required": ["query", "response", "ground_truth"]
      }
    },
    "testing_criteria": [
      {
        "type": "azure_ai_evaluator",
        "name": "coherence",
        "evaluator_name": "builtin.coherence",
        "initialization_parameters": { "deployment_name": "gpt-5-mini" },
        "data_mapping": {
          "query": "{{item.query}}",
          "response": "{{item.response}}"
        }
      },
      {
        "type": "azure_ai_evaluator",
        "name": "violence",
        "evaluator_name": "builtin.violence",
        "initialization_parameters": { "deployment_name": "gpt-5-mini" },
        "data_mapping": {
          "query": "{{item.query}}",
          "response": "{{item.response}}"
        }
      },
      {
        "type": "azure_ai_evaluator",
        "name": "f1",
        "evaluator_name": "builtin.f1_score",
        "data_mapping": {
          "response": "{{item.response}}",
          "ground_truth": "{{item.ground_truth}}"
        }
      }
    ]
  }' | jq -r '.id')

# Step 2: Create a run against your dataset
curl --request POST \
  --url "https://${ACCOUNT}.services.ai.azure.com/api/projects/${PROJECT}/openai/evals/${EVAL_ID}/runs?api-version=v1" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "dataset-run",
    "data_source": {
      "type": "jsonl",
      "source": {
        "type": "file_id",
        "id": "YOUR_DATASET_ID"
      }
    }
  }'

完整可執行的範例請參見 GitHub 上的sample_evaluations_builtin_with_dataset_id.py 。若要進行投票完成並解讀結果，請參閱「取得結果」。

模型目標評估

在執行時向已部署的模型發送查詢，並利用 azure_ai_target_completions 帶有 azure_ai_model 目標的資料來源類型來評估回應。你的輸入資料包含查詢;模型會產生回應，然後進行評估。

小提示

開始前，先完成「開始」和「準備輸入資料」。

定義訊息範本與目標

範本 input_messages 控制查詢如何傳送到模型。用 {{item.query}} 來參考輸入資料中的欄位。指定要評估的模型及可選的抽樣參數：

input_messages = {
    "type": "template",
    "template": [
        {
            "type": "message",
            "role": "user",
            "content": {
                "type": "input_text",
                "text": "{{item.query}}"
            }
        }
    ]
}

target = {
    "type": "azure_ai_model",
    "model": "gpt-5-mini",
    "sampling_params": {
        "top_p": 1.0,
        "max_completion_tokens": 2048,
    },
}

設置評估器與資料映射

當模型在執行時產生回應時，請使用 {{sample.output_text}} in data_mapping 來參考模型的輸出。用 {{item.field}} 來參考輸入資料中的欄位。

data_source_config = DataSourceConfigCustom(
    type="custom",
    item_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string"},
        },
        "required": ["query"],
    },
    include_sample_schema=True,
)

testing_criteria = [
    {
        "type": "azure_ai_evaluator",
        "name": "coherence",
        "evaluator_name": "builtin.coherence",
        "initialization_parameters": {
            "deployment_name": model_deployment_name,
        },
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{sample.output_text}}",
        },
    },
    {
        "type": "azure_ai_evaluator",
        "name": "violence",
        "evaluator_name": "builtin.violence",
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{sample.output_text}}",
        },
    },
]

eval_object = client.evals.create(
    name="Model Target Evaluation",
    data_source_config=data_source_config,
    testing_criteria=testing_criteria,
)

data_source = {
    "type": "azure_ai_target_completions",
    "source": {
        "type": "file_id",
        "id": data_id,
    },
    "input_messages": input_messages,
    "target": target,
}

eval_run = client.evals.runs.create(
    eval_id=eval_object.id,
    name="model-target-evaluation",
    data_source=data_source,
)

curl --request POST \
  --url "https://${ACCOUNT}.services.ai.azure.com/api/projects/${PROJECT}/openai/evals/${EVAL_ID}/runs?api-version=v1" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "model-target-evaluation",
    "data_source": {
      "type": "azure_ai_target_completions",
      "source": {
        "type": "file_id",
        "id": "YOUR_DATASET_ID"
      },
      "input_messages": {
        "type": "template",
        "template": [
          {
            "type": "message",
            "role": "user",
            "content": {
              "type": "input_text",
              "text": "{{item.query}}"
            }
          }
        ]
      },
      "target": {
        "type": "azure_ai_model",
        "model": "gpt-5-mini",
        "sampling_params": {
          "top_p": 1.0,
          "max_completion_tokens": 2048
        }
      }
    }
  }'

完整可執行的範例請參見 GitHub 上的 sample_model_evaluation.py 。若要進行投票完成並解讀結果，請參閱「取得結果」。

小提示

要再進行一次評估運行，可以用同一套程式碼。

代理人目標評估

在執行時將查詢傳送給 Foundry 代理，並利用 azure_ai_target_completions 帶有 azure_ai_agent 目標的資料來源類型評估回應。

小提示

開始前，先完成「開始」和「準備輸入資料」。

定義訊息範本與目標

範本 input_messages 控制查詢如何傳送給代理人。用 {{item.query}} 來參考輸入資料中的欄位。請指定要評估的代理人名稱：

input_messages = {
    "type": "template",
    "template": [
        {
            "type": "message",
            "role": "developer",
            "content": {
                "type": "input_text",
                "text": "You are a helpful assistant. Answer clearly and safely."
            }
        },
        {
            "type": "message",
            "role": "user",
            "content": {
                "type": "input_text",
                "text": "{{item.query}}"
            }
        }
    ]
}

target = {
    "type": "azure_ai_agent",
    "name": "my-agent",
    "version": "1"  # Optional. Uses latest version if omitted.
}

設置評估器與資料映射

當代理在執行時產生回應時，請在 {{sample.*}} 中使用 data_mapping 變數以參考代理的輸出。

變數	Description	用途
`{{sample.output_text}}`	代理人的純文字回覆。	期望字串回應的評估器（例如， `coherence`， `violence`）。
`{{sample.output_items}}`	代理的結構化 JSON 輸出，包括工具呼叫。	需要完整互動上下文的評估器（例如， `task_adherence`）。
`{{item.field}}`	一個來自你輸入資料的欄位。	輸入欄位如 `query` 或 `ground_truth`。

小提示

該 query 欄位可包含結構化 JSON，包括系統訊息與對話歷史。有些代理評估員如 task_adherence 會利用這個情境來進行更精確的評分。關於查詢格式的詳細資訊，請參見代理評估器。

data_source_config = DataSourceConfigCustom(
    type="custom",
    item_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string"},
        },
        "required": ["query"],
    },
    include_sample_schema=True,
)

testing_criteria = [
    {
        "type": "azure_ai_evaluator",
        "name": "coherence",
        "evaluator_name": "builtin.coherence",
        "initialization_parameters": {
            "deployment_name": model_deployment_name,
        },
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{sample.output_text}}",
        },
    },
    {
        "type": "azure_ai_evaluator",
        "name": "violence",
        "evaluator_name": "builtin.violence",
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{sample.output_text}}",
        },
    },
    {
        "type": "azure_ai_evaluator",
        "name": "task_adherence",
        "evaluator_name": "builtin.task_adherence",
        "initialization_parameters": {
            "deployment_name": model_deployment_name,
        },
        "data_mapping": {
            "query": "{{item.query}}",
            "response": "{{sample.output_items}}",
        },
    },
]

建立評估並執行

Python（編程語言）
cURL

eval_object = client.evals.create(
    name="Agent Target Evaluation",
    data_source_config=data_source_config,
    testing_criteria=testing_criteria,
)

data_source = {
    "type": "azure_ai_target_completions",
    "source": {
        "type": "file_id",
        "id": data_id,
    },
    "input_messages": input_messages,
    "target": target,
}

agent_eval_run = client.evals.runs.create(
    eval_id=eval_object.id,
    name="agent-target-evaluation",
    data_source=data_source,
)

curl --request POST \
  --url "https://${ACCOUNT}.services.ai.azure.com/api/projects/${PROJECT}/openai/evals/${EVAL_ID}/runs?api-version=v1" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "agent-target-evaluation",
    "data_source": {
      "type": "azure_ai_target_completions",
      "source": {
        "type": "file_id",
        "id": "YOUR_DATASET_ID"
      },
      "input_messages": {
        "type": "template",
        "template": [
          {
            "type": "message",
            "role": "developer",
            "content": {
              "type": "input_text",
              "text": "You are a helpful assistant. Answer clearly and safely."
            }
          },
          {
            "type": "message",
            "role": "user",
            "content": {
              "type": "input_text",
              "text": "{{item.query}}"
            }
          }
        ]
      },
      "target": {
        "type": "azure_ai_agent",
        "name": "my-agent",
        "version": "1"
      }
    }
  }'

完整可執行的範例，請參見 GitHub 上的sample_agent_evaluation.py 。若要進行投票完成並解讀結果，請參閱「取得結果」。

代理反應評估

利用資料來源類型，透過回應 ID azure_ai_responses 檢索並評估 Foundry 代理的回應。利用此情境評估特定代理人互動發生後的情況。

小提示

在開始之前，請先完成「開始」。

回應 ID 是每次 Foundry 代理產生回應時回傳的唯一識別碼。你可以透過 Responses API 或應用程式的追蹤日誌，從代理互動中收集回應 ID。將 ID 內嵌為檔案內容，或上傳為資料集（參見準備輸入資料）。

收集回應識別碼

每次呼叫回應 API 都會回傳一個具有唯一 id 欄位的回應物件。從應用程式的互動中收集這些識別碼，或直接產生：

# Generate response IDs by calling a model through the Responses API
response = client.responses.create(
    model=model_deployment_name,
    input="What is machine learning?",
)
print(response.id)  # Example: resp_abc123

你也可以從應用程式的追蹤日誌或監控管線中，從代理互動中收集回應 ID。每個回應 ID 唯一識別一個儲存的回應，評估服務可檢索。

建立評估並執行

Python（編程語言）
cURL

data_source_config = {"type": "azure_ai_source", "scenario": "responses"}

testing_criteria = [
    {
        "type": "azure_ai_evaluator",
        "name": "coherence",
        "evaluator_name": "builtin.coherence",
        "initialization_parameters": {
            "deployment_name": model_deployment_name,
        },
    },
    {
        "type": "azure_ai_evaluator",
        "name": "violence",
        "evaluator_name": "builtin.violence",
    },
]

eval_object = client.evals.create(
    name="Agent Response Evaluation",
    data_source_config=data_source_config,
    testing_criteria=testing_criteria,
)

data_source = {
    "type": "azure_ai_responses",
    "item_generation_params": {
        "type": "response_retrieval",
        "data_mapping": {"response_id": "{{item.resp_id}}"},
        "source": {
            "type": "file_content",
            "content": [
                {"item": {"resp_id": "resp_abc123"}},
                {"item": {"resp_id": "resp_def456"}},
            ]
        },
    },
}

eval_run = client.evals.runs.create(
    eval_id=eval_object.id,
    name="agent-response-evaluation",
    data_source=data_source,
)

curl --request POST \
  --url "https://${ACCOUNT}.services.ai.azure.com/api/projects/${PROJECT}/openai/evals/${EVAL_ID}/runs?api-version=v1" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "agent-response-evaluation",
    "data_source": {
      "type": "azure_ai_responses",
      "item_generation_params": {
        "type": "response_retrieval",
        "data_mapping": {"response_id": "{{item.resp_id}}"},
        "source": {
          "type": "file_content",
          "content": [
            {"item": {"resp_id": "resp_abc123"}},
            {"item": {"resp_id": "resp_def456"}}
          ]
        }
      }
    }
  }'

完整可執行的範例請參考 GitHub 上的sample_agent_response_evaluation.py 。若要進行投票完成並解讀結果，請參閱「取得結果」。

取得成果

當評估運行完成後，取得評分結果，並在管理介面或以程式方式檢視。

調查以獲得結果

評估過程是非同步的。輪詢執行狀態直到完成，然後取得結果：

import time
from pprint import pprint

while True:
    run = client.evals.runs.retrieve(
        run_id=eval_run.id, eval_id=eval_object.id
    )
    if run.status in ("completed", "failed"):
        break
    time.sleep(5)
    print("Waiting for eval run to complete...")

# Retrieve results
output_items = list(
    client.evals.runs.output_items.list(
        run_id=run.id, eval_id=eval_object.id
    )
)
pprint(output_items)
print(f"Report URL: {run.report_url}")

解譯結果

對於單一資料範例，所有評估器都會輸出以下模式：

標籤：一個二進位的「通過」或「失敗」標籤，類似單元測試的輸出。利用此結果促進評估者間的比較。
分數：根據每位評鑑者的自然量表計算的分數。部分評估人員採用細緻的評分標準，評分標準為5分制（品質評估員）或7分制（內容安全評估員）。其他如文本相似度評估器則使用 F1 分數，即介於 0 到 1 之間的浮點數。任何非二元的「分數」會根據「臨界值」在「標籤」欄位中轉換為「通過」或「未通過」。
門檻：任何非二元分數都會根據預設的門檻值轉為「通過」或「失敗」，使用者可在 SDK 體驗中修改此門檻。
理由：為了提升可理解性，所有 LLM 評審也會輸出一個推理欄位，說明為何給出某個分數。
細節：（可選）對於某些評估器，如 tool_call_accuracy，可能會有一個「詳情」欄位或旗標，包含更多資訊以協助使用者除錯應用程式。

範例輸出（單一項目）

{
  "type": "azure_ai_evaluator",
  "name": "Coherence",
  "metric": "coherence",
  "score": 4.0,
  "label": "pass",
  "reason": "The response is well-structured and logically organized, presenting information in a clear and coherent manner.",
  "threshold": 3,
  "passed": true
}

範例輸出（聚合）

對於多個資料範例 (資料集) 的彙總結果，具有「通過」標記之範例的平均比率即為該資料集的合格率。

{
  "eval_id": "eval_abc123",
  "run_id": "run_xyz789",
  "status": "completed",
  "result_counts": {
    "passed": 85,
    "failed": 15,
    "total": 100
  },
  "per_testing_criteria_results": [
    {
      "name": "coherence",
      "passed": 92,
      "failed": 8,
      "pass_rate": 0.92
    },
    {
      "name": "relevance", 
      "passed": 78,
      "failed": 22,
      "pass_rate": 0.78
    }
  ]
}

故障排除

工作持續了很久

你的評估工作可能會在運行狀態停留較長時間。這通常發生在 Azure OpenAI 模型部署容量不足時，導致服務需要重試請求。

解決方法：

取消當前的評估作業，使用 client.evals.runs.cancel(run_id, eval_id=eval_id)。
在 Azure 入口網站中增加模型容量。
再次執行評估。

驗證錯誤

若收到 401 Unauthorized OR 403 Forbidden 錯誤，請確認：

你的 DefaultAzureCredential 設定是正確的（如果使用 Azure CLI 就執行 az login ）。
你的帳號在 Foundry 專案中有 Azure AI 使用者 角色。
專案端點的網址是正確的，包含帳號和專案名稱。

資料格式錯誤

若評估因結構或資料映射錯誤而失敗：

確認你的 JSONL 檔案每行只有一個有效的 JSON 物件。
確認欄位名稱 data_mapping 是否與你的 JSONL 檔案欄位名稱完全一致（大小寫區分）。
檢查屬性是否 item_schema 與資料集中的欄位相符。

速率限制錯誤

評估運行的創建在租戶、訂閱及專案層級均有費率限制。如果您收到 429 Too Many Requests 回覆：

請查看 retry-after 回覆中的標頭，了解建議的等待時間。
請查看回應文以了解費率限制的細節。
在重試失敗請求時使用指數退避。

若評估工作在執行過程中發生 429 錯誤失敗：

縮小評估資料集的大小，或將其拆分成較小的批次。
在 Azure 入口網站中，提高模型部署的每分鐘代幣（TPM）配額。

代理評估工具錯誤

如果代理評估者回傳不支援工具的錯誤：

請查看經紀人評估員所支援的工具。
作為一個變通方法，將未支援的工具包裝成使用者定義的功能工具，讓評估者能自行評估。

意見反應

此頁面對您有幫助嗎？

Last updated on 2026-02-28

共用方式為

使用 Microsoft Foundry SDK 在雲端執行評估

雲端評估的運作方式

先決條件

開始

準備輸入資料

上傳資料集（建議）

提供線上資料

資料集評估

定義資料結構與評估器

建立評估並執行

模型目標評估

定義訊息範本與目標

設置評估器與資料映射

建立評估並執行

代理人目標評估

定義訊息範本與目標

設置評估器與資料映射

建立評估並執行

代理反應評估

收集回應識別碼

建立評估並執行

取得成果

調查以獲得結果

解譯結果

範例輸出（單一項目）

範例輸出（聚合）

故障排除

工作持續了很久

驗證錯誤

資料格式錯誤

速率限制錯誤

代理評估工具錯誤

相關內容

意見反應

其他資源