快速入门：评估响应质量

2025-05-17

在本快速入门中，你将创建一个 MSTest 应用来评估 OpenAI 模型中聊天响应的质量。测试应用使用 Microsoft.Extensions.AI.Evaluation 库。

注释

本快速入门演示了评估 API 的最简单用法。值得注意的是，它不演示如何使用响应缓存和报告功能，如果你正在创作作为“脱机”评估管道一部分运行的单元测试，这一点非常重要。本快速入门中所示的方案适用于在生产代码中对 AI 响应进行“联机”评估，并将分数记录到遥测，其中缓存和报告不相关。有关演示缓存和报告功能的教程，请参阅教程：使用响应缓存和报告评估模型的响应

先决条件

.NET 8 或更高版本
Visual Studio Code （可选）

配置 AI 服务

若要使用 Azure 门户预配 Azure OpenAI 服务和模型，请完成创建和部署 Azure OpenAI 服务资源一文中的步骤。在“部署模型”步骤中，选择模型 gpt-4o 。

创建测试应用

完成以下步骤以创建连接到 gpt-4o AI 模型的 MSTest 项目。

在终端窗口中，导航到要在其中创建应用的目录，并使用 dotnet new 以下命令创建新的 MSTest 应用：
```
dotnet new mstest -o TestAI
```

导航到 TestAI 目录，并将必要的包添加到应用：

dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI.Abstractions
dotnet add package Microsoft.Extensions.AI.Evaluation
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets

运行以下命令，为 Azure OpenAI 终结点、模型名称和租户 ID 添加应用机密：
```
dotnet user-secrets init
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-Azure-OpenAI-endpoint>
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-ID>
```
（根据环境，可能不需要租户 ID。在这种情况下，请将其从实例化 DefaultAzureCredential 的代码中删除。）
在所选编辑器中打开新应用。

添加测试应用代码

将 Test1.cs 文件重命名为 MyTests.cs，然后打开该文件并将类重命名为 MyTests。

将专用 ChatConfiguration 消息和聊天消息和响应成员添加到 MyTests 类。该 s_messages 字段是一个包含两 ChatMessage 个对象的列表，一个对象指示聊天机器人的行为，另一个是用户的问题。

private static ChatConfiguration? s_chatConfiguration;
private static IList<ChatMessage> s_messages = [
    new ChatMessage(
        ChatRole.System,
        """
        You're an AI assistant that can answer questions related to astronomy.
        Keep your responses concise and try to stay under 100 words.
        Use the imperial measurement system for all measurements in your response.
        """),
    new ChatMessage(
        ChatRole.User,
        "How far is the planet Venus from Earth at its closest and furthest points?")];
private static ChatResponse s_response = new();

将 InitializeAsync 方法添加到 MyTests 类。

[ClassInitialize]
public static async Task InitializeAsync(TestContext _)
{
    /// Set up the <see cref="ChatConfiguration"/>,
    /// which includes the <see cref="IChatClient"/> that the
    /// evaluator uses to communicate with the model.
    s_chatConfiguration = GetAzureOpenAIChatConfiguration();

    var chatOptions =
        new ChatOptions
        {
            Temperature = 0.0f,
            ResponseFormat = ChatResponseFormat.Text
        };

    // Fetch the response to be evaluated
    // and store it in a static variable.
    s_response = await s_chatConfiguration.ChatClient.GetResponseAsync(s_messages, chatOptions);
}

此方法完成以下任务：

设置ChatConfiguration。
设置 ChatOptions，包括 Temperature 和 ResponseFormat。
通过调用 GetResponseAsync(IEnumerable<ChatMessage>, ChatOptions, CancellationToken)提取要计算的响应，并将其存储在静态变量中。

添加GetAzureOpenAIChatConfiguration方法，该方法创建IChatClient供计算器与模型通信。

private static ChatConfiguration GetAzureOpenAIChatConfiguration()
{
    IConfigurationRoot config = new ConfigurationBuilder().AddUserSecrets<MyTests>().Build();

    string endpoint = config["AZURE_OPENAI_ENDPOINT"];
    string model = config["AZURE_OPENAI_GPT_NAME"];
    string tenantId = config["AZURE_TENANT_ID"];

    // Get a chat client for the Azure OpenAI endpoint.
    AzureOpenAIClient azureClient =
        new(
            new Uri(endpoint),
            new DefaultAzureCredential(new DefaultAzureCredentialOptions() { TenantId = tenantId }));
    IChatClient client = azureClient.GetChatClient(deploymentName: model).AsIChatClient();

    return new ChatConfiguration(client);
}

添加测试方法以评估模型的响应。

[TestMethod]
public async Task TestCoherence()
{
    IEvaluator coherenceEvaluator = new CoherenceEvaluator();
    EvaluationResult result = await coherenceEvaluator.EvaluateAsync(
        s_messages,
        s_response,
        s_chatConfiguration);

    /// Retrieve the score for coherence from the <see cref="EvaluationResult"/>.
    NumericMetric coherence = result.Get<NumericMetric>(CoherenceEvaluator.CoherenceMetricName);

    // Validate the default interpretation
    // for the returned coherence metric.
    Assert.IsFalse(coherence.Interpretation!.Failed);
    Assert.IsTrue(coherence.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional);

    // Validate that no diagnostics are present
    // on the returned coherence metric.
    Assert.IsFalse(coherence.ContainsDiagnostics());
}

此方法执行以下步骤：

调用CoherenceEvaluator以评估响应的一致性。该EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)方法返回一个包含EvaluationResult的NumericMetric。 NumericMetric 包含一个数值，该值通常用于表示在明确范围内的数值评分。
从EvaluationResult中检索一致性分数。
验证返回的一致性指标 的默认解释 。评估者可以为他们返回的指标添加默认解释。还可以根据需要更改默认解释以满足特定要求。
验证返回的一致性指标上没有诊断存在。评估人员可以在他们返回的指标中包含诊断，以指示评估期间遇到的错误、警告或其他异常情况。

运行测试/评估

使用首选测试工作流运行测试，例如，使用 CLI 命令 dotnet test 或通过测试资源管理器运行测试。

清理资源

如果不再需要它们，请删除 Azure OpenAI 资源和 GPT-4 模型部署。

在 Azure 门户中，导航到 Azure OpenAI 资源。
选择 Azure OpenAI 资源，然后选择删除。

后续步骤

评估来自不同 OpenAI 模型的响应。
向评估代码添加响应缓存和报告。有关详细信息，请参阅教程：使用响应缓存和报告评估模型的响应。

通过