通过


你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

教程:使用 Foundry Local 生成多轮聊天助手

在本教程中,你将生成一个完全在设备上运行的交互式聊天助手。 助手在多个交换中维护对话上下文,因此它会记住你在对话前面讨论的内容。 可以使用 Foundry 本地 SDK 选择模型,定义系统提示逐令牌流式传输响应。

本教程中,您将学习如何:

  • 设置项目并安装 Foundry 本地 SDK
  • 浏览模型目录并选择模型
  • 定义用于塑造助理行为的系统提示
  • 使用消息历史记录实现多轮次对话
  • 流式传输响应以提升响应体验
  • 在会话结束时清理资源

先决条件

  • 至少具有 8 GB RAM 的 Windows、macOS 或 Linux 计算机。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。 若要克隆存储库并导航到示例,请使用:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/cs/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署,请选择“Windows”选项卡。Windows包与Windows ML运行时集成,提供相同的API接口,并支持更广泛的硬件加速范围。

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

GitHub存储库中的 C# 示例是预配置项目。 如果要从头开始生成,则应阅读 Foundry Local SDK 参考 ,详细了解如何使用 Foundry Local 设置 C# 项目。

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录,其中列出了所有可用的模型。 在此步骤中,将初始化 SDK 并为聊天助手选择模型。

  • 打开 Program.cs 并将其内容替换为以下代码来初始化 SDK 并选择模型:

    CancellationToken ct = CancellationToken.None;
    
    var config = new Configuration
    {
        AppName = "foundry_local_samples",
        LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
    };
    
    using var loggerFactory = LoggerFactory.Create(builder =>
    {
        builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
    });
    var logger = loggerFactory.CreateLogger<Program>();
    
    // Initialize the singleton instance
    await FoundryLocalManager.CreateAsync(config, logger);
    var mgr = FoundryLocalManager.Instance;
    
    // Download and register all execution providers.
    var currentEp = "";
    await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
    {
        if (epName != currentEp)
        {
            if (currentEp != "") Console.WriteLine();
            currentEp = epName;
        }
        Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
    });
    if (currentEp != "") Console.WriteLine();
    
    // Select and load a model from the catalog
    var catalog = await mgr.GetCatalogAsync();
    var model = await catalog.GetModelAsync("qwen2.5-0.5b")
        ?? throw new Exception("Model not found");
    
    await model.DownloadAsync(progress =>
    {
        Console.Write($"\rDownloading model: {progress:F2}%");
        if (progress >= 100f) Console.WriteLine();
    });
    
    await model.LoadAsync();
    Console.WriteLine("Model loaded and ready.");
    
    // Get a chat client
    var chatClient = await model.GetChatClientAsync();
    

    该方法 GetModelAsync 接受模型别名,该别名是映射到目录中特定模型的短友好名称。 该方法 DownloadAsync 将模型权重提取到本地缓存,并使 LoadAsync 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。 它是对话历史记录中的第一条消息,模型在整个对话中引用它。

添加系统提示以调整助手的响应方式:

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

小窍门

尝试使用不同的系统提示来更改助手的行为。 例如,可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个对话维护上下文。 通过保留所有消息的列表(系统、用户和助理),并发送包含每个请求的完整列表来实现此目的。 模型使用此历史记录生成上下文相关的响应。

添加一个对话循环,该循环:

  • 从控制台读取用户输入。
  • 将用户消息追加到历史记录。
  • 将完整历史记录发送到模型。
  • 将助手的响应追加到下一轮对话历史中。
while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

每次调用CompleteChatAsync 都会接收完整的消息历史记录。 这就是模型“记住”之前轮次的方式 , 它不会在调用之间存储状态。

添加流响应

流式传输在每个令牌生成时打印出来,这使得助手更具响应性。 将 CompleteChatAsync 调用替换为 CompleteChatStreamingAsync 以逐个令牌流式传输响应。

更新会话循环以使用流式处理:

// Stream the response token by token
Console.Write("Assistant: ");
var fullResponse = string.Empty;
var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    var content = chunk.Choices[0].Message.Content;
    if (!string.IsNullOrEmpty(content))
    {
        Console.Write(content);
        Console.Out.Flush();
        fullResponse += content;
    }
}
Console.WriteLine("\n");

流版本会累积完整响应,以便在流完成后将其添加到聊天历史记录中。

完整代码

Program.cs 的内容替换为完整的代码:

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

Console.WriteLine("\nChat assistant ready! Type 'quit' to exit.\n");

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

// Clean up - unload the model
await model.UnloadAsync();
Console.WriteLine("Model unloaded. Goodbye!");

运行聊天助理:

dotnet run

会看到类似于以下内容的输出:

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要?” 时, 它知道你仍在谈论光合作用。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。 若要克隆存储库并导航到示例,请使用:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/js/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署,请选择“Windows”选项卡。Windows包与Windows ML运行时集成,提供相同的API接口,并支持更广泛的硬件加速范围。

npm install foundry-local-sdk-winml openai

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录,其中列出了所有可用的模型。 在此步骤中,将初始化 SDK 并为聊天助手选择模型。

  1. 创建名为 的文件 index.js

  2. 添加以下代码以初始化 SDK 并选择模型:

    // Initialize the Foundry Local SDK
    const manager = FoundryLocalManager.create({
        appName: 'foundry_local_samples',
        logLevel: 'info'
    });
    
    // Download and register all execution providers.
    let currentEp = '';
    await manager.downloadAndRegisterEps((epName, percent) => {
        if (epName !== currentEp) {
            if (currentEp !== '') process.stdout.write('\n');
            currentEp = epName;
        }
        process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
    });
    if (currentEp !== '') process.stdout.write('\n');
    
    // Select and load a model from the catalog
    const model = await manager.catalog.getModel('qwen2.5-0.5b');
    
    await model.download((progress) => {
        process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
    });
    console.log('\nModel downloaded.');
    
    await model.load();
    console.log('Model loaded and ready.');
    
    // Create a chat client
    const chatClient = model.createChatClient();
    

    该方法 getModel 接受模型别名,该别名是映射到目录中特定模型的短友好名称。 该方法 download 将模型权重提取到本地缓存,并使 load 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。 它是对话历史记录中的第一条消息,模型在整个对话中引用它。

添加系统提示以调整助手的响应方式:

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

小窍门

尝试使用不同的系统提示来更改助手的行为。 例如,可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个对话维护上下文。 通过保留所有消息的列表(系统、用户和助理),并发送包含每个请求的完整列表来实现此目的。 模型使用此历史记录生成上下文相关的响应。

添加一个对话循环,该循环:

  • 从控制台读取用户输入。
  • 将用户消息追加到历史记录。
  • 将完整历史记录发送到模型。
  • 将助手的响应追加到下一轮对话历史中。
while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

每次调用completeChat 都会接收完整的消息历史记录。 这就是模型“记住”之前轮次的方式 , 它不会在调用之间存储状态。

添加流响应

流式传输在每个令牌生成时打印出来,这使得助手更具响应性。 将 completeChat 调用替换为 completeStreamingChat 以逐个令牌流式传输响应。

更新会话循环以使用流式处理:

// Stream the response token by token
process.stdout.write('Assistant: ');
let fullResponse = '';
for await (const chunk of chatClient.completeStreamingChat(messages)) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
        fullResponse += content;
    }
}
console.log('\n');

流版本会累积完整响应,以便在流完成后将其添加到聊天历史记录中。

完整代码

创建一个名为 index.js 并添加以下完整代码的文件:

import { FoundryLocalManager } from 'foundry-local-sdk';
import * as readline from 'readline';

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

// Set up readline for console input
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

const askQuestion = (prompt) => new Promise((resolve) => rl.question(prompt, resolve));

console.log('\nChat assistant ready! Type \'quit\' to exit.\n');

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

// Clean up - unload the model
await model.unload();
console.log('Model unloaded. Goodbye!');
rl.close();

运行聊天助理:

node index.js

会看到类似于以下内容的输出:

Downloading model: 100.00%
Model downloaded.
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要?” 时, 它知道你仍在谈论光合作用。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。 若要克隆存储库并导航到示例,请使用:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/python/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署,请选择“Windows”选项卡。Windows包与Windows ML运行时集成,提供相同的API接口,并支持更广泛的硬件加速范围。

pip install foundry-local-sdk-winml openai

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录,其中列出了所有可用的模型。 在此步骤中,将初始化 SDK 并为聊天助手选择模型。

  1. 创建名为 的文件 main.py

  2. 添加以下代码以初始化 SDK 并选择模型:

    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance
    
    # Download and register all execution providers.
    current_ep = ""
    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)
    
    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()
    
    # Select and load a model from the catalog
    model = manager.catalog.get_model("qwen2.5-0.5b")
    model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
    print()
    model.load()
    print("Model loaded and ready.")
    
    # Get a chat client
    client = model.get_chat_client()
    

    该方法 get_model 接受模型别名,该别名是映射到目录中特定模型的短友好名称。 该方法 download 将模型权重提取到本地缓存,并使 load 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。 它是对话历史记录中的第一条消息,模型在整个对话中引用它。

添加系统提示以调整助手的响应方式:

# Start the conversation with a system prompt
messages = [
    {
        "role": "system",
        "content": "You are a helpful, friendly assistant. Keep your responses "
                   "concise and conversational. If you don't know something, say so."
    }
]

小窍门

尝试使用不同的系统提示来更改助手的行为。 例如,可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个交换维护上下文。 通过保留所有消息的列表(系统、用户和助理),并发送包含每个请求的完整列表来实现此目的。 模型使用此历史记录生成上下文相关的响应。

添加一个对话循环,该循环:

  • 从控制台读取用户输入。
  • 将用户消息追加到历史记录。
  • 将完整历史记录发送到模型。
  • 将助手的响应追加到下一轮历史记录。
while True:
    user_input = input("You: ")
    if user_input.strip().lower() in ("quit", "exit"):
        break

    # Add the user's message to conversation history
    messages.append({"role": "user", "content": user_input})

    # Stream the response token by token
    print("Assistant: ", end="", flush=True)
    full_response = ""
    for chunk in client.complete_streaming_chat(messages):
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
            full_response += content
    print("\n")

    # Add the complete response to conversation history
    messages.append({"role": "assistant", "content": full_response})

每次调用 complete_chat 都会接收到完整的消息历史记录。 这就是模型“记住”之前轮次的方式 , 它不会在调用之间存储状态。

添加流响应

流式处理会在生成令牌的过程中打印每个令牌,这使得助手的反应更迅速。 将 complete_chat 调用替换为 complete_streaming_chat 按令牌流式传输响应令牌。

更新会话循环以使用流式传输:

# Stream the response token by token
print("Assistant: ", end="", flush=True)
full_response = ""
for chunk in client.complete_streaming_chat(messages):
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content
print("\n")

流版本会累积完整响应,以便在流完成后将其添加到聊天历史记录中。

完整代码

创建一个名为 main.py 并添加以下完整代码的文件:

from foundry_local_sdk import Configuration, FoundryLocalManager


def main():
    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    # Download and register all execution providers.
    current_ep = ""
    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()

    # Select and load a model from the catalog
    model = manager.catalog.get_model("qwen2.5-0.5b")
    model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
    print()
    model.load()
    print("Model loaded and ready.")

    # Get a chat client
    client = model.get_chat_client()

    # Start the conversation with a system prompt
    messages = [
        {
            "role": "system",
            "content": "You are a helpful, friendly assistant. Keep your responses "
                       "concise and conversational. If you don't know something, say so."
        }
    ]

    print("\nChat assistant ready! Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ")
        if user_input.strip().lower() in ("quit", "exit"):
            break

        # Add the user's message to conversation history
        messages.append({"role": "user", "content": user_input})

        # Stream the response token by token
        print("Assistant: ", end="", flush=True)
        full_response = ""
        for chunk in client.complete_streaming_chat(messages):
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        print("\n")

        # Add the complete response to conversation history
        messages.append({"role": "assistant", "content": full_response})

    # Clean up - unload the model
    model.unload()
    print("Model unloaded. Goodbye!")


if __name__ == "__main__":
    main()

运行聊天助理:

python main.py

会看到类似于以下内容的输出:

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要?” 时, 它知道你仍在谈论光合作用。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。 若要克隆存储库并导航到示例,请使用:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/rust/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署,请选择“Windows”选项卡。Windows包与Windows ML运行时集成,提供相同的API接口,并支持更广泛的硬件加速范围。

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录,其中列出了所有可用的模型。 在此步骤中,将初始化 SDK 并为聊天助手选择模型。

  • 打开 src/main.rs 并将其内容替换为以下代码来初始化 SDK 并选择模型:

    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;
    
    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();
    
    // Select and load a model from the catalog
    let model = manager.catalog().get_model("qwen2.5-0.5b").await?;
    
    if !model.is_cached().await? {
        println!("Downloading model...");
        model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }
    
    model.load().await?;
    println!("Model loaded and ready.");
    
    // Create a chat client
    let client = model.create_chat_client().temperature(0.7).max_tokens(512);
    

    该方法 get_model 接受模型别名,该别名是映射到目录中特定模型的短友好名称。 该方法 download 将模型权重提取到本地缓存,并使 load 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。 它是对话历史记录中的第一条消息,模型在整个对话中引用它。

添加系统提示以调整助手的响应方式:

// Start the conversation with a system prompt
let mut messages: Vec<ChatCompletionRequestMessage> = vec![
    ChatCompletionRequestSystemMessage::from(
        "You are a helpful, friendly assistant. Keep your responses \
         concise and conversational. If you don't know something, say so.",
    )
    .into(),
];

小窍门

尝试使用不同的系统提示来更改助手的行为。 例如,可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个交换维护上下文。 通过保留所有消息(系统、用户和助理)的向量并发送包含每个请求的完整列表来实现此目的。 模型使用此历史记录生成上下文相关的响应。

添加一个对话循环:

  • 从控制台读取用户输入。
  • 将用户消息追加到历史记录。
  • 将完整历史记录发送到模型。
  • 将助手的响应追加到下一轮历史记录。
loop {
    print!("You: ");
    io::stdout().flush()?;

    let mut input = String::new();
    stdin.lock().read_line(&mut input)?;
    let input = input.trim();

    if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
        break;
    }

    // Add the user's message to conversation history
    messages.push(ChatCompletionRequestUserMessage::from(input).into());

    // Stream the response token by token
    print!("Assistant: ");
    io::stdout().flush()?;
    let mut full_response = String::new();
    let mut stream = client.complete_streaming_chat(&messages, None).await?;
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        if let Some(choice) = chunk.choices.first() {
            if let Some(ref content) = choice.delta.content {
                print!("{content}");
                io::stdout().flush()?;
                full_response.push_str(content);
            }
        }
    }
    println!("\n");

    // Add the complete response to conversation history
    let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
        serde_json::json!({"role": "assistant", "content": full_response}),
    )?;
    messages.push(assistant_msg);
}

每次调用以 complete_chat 接收完整的消息历史记录。 这就是模型“记住”之前轮次的方式 , 它不会在调用之间存储状态。

添加流响应

流式传输在每个令牌生成时打印出来,这使得助手更具响应性。 将 complete_chat 调用替换为 complete_streaming_chat 按令牌流式传输响应令牌。

更新会话循环以使用流式处理:

// Stream the response token by token
print!("Assistant: ");
io::stdout().flush()?;
let mut full_response = String::new();
let mut stream = client.complete_streaming_chat(&messages, None).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(choice) = chunk.choices.first() {
        if let Some(ref content) = choice.delta.content {
            print!("{content}");
            io::stdout().flush()?;
            full_response.push_str(content);
        }
    }
}
println!("\n");

流版本会累积完整响应,以便在流完成后将其添加到聊天历史记录中。

完整代码

src/main.rs 的内容替换为完整的代码:

use foundry_local_sdk::{
    ChatCompletionRequestMessage,
    ChatCompletionRequestSystemMessage, ChatCompletionRequestUserMessage,
    FoundryLocalConfig, FoundryLocalManager,
};
use std::io::{self, BufRead, Write};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();

    // Select and load a model from the catalog
    let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

    if !model.is_cached().await? {
        println!("Downloading model...");
        model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    model.load().await?;
    println!("Model loaded and ready.");

    // Create a chat client
    let client = model.create_chat_client().temperature(0.7).max_tokens(512);

    // Start the conversation with a system prompt
    let mut messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from(
            "You are a helpful, friendly assistant. Keep your responses \
             concise and conversational. If you don't know something, say so.",
        )
        .into(),
    ];

    println!("\nChat assistant ready! Type 'quit' to exit.\n");

    let stdin = io::stdin();
    loop {
        print!("You: ");
        io::stdout().flush()?;

        let mut input = String::new();
        stdin.lock().read_line(&mut input)?;
        let input = input.trim();

        if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
            break;
        }

        // Add the user's message to conversation history
        messages.push(ChatCompletionRequestUserMessage::from(input).into());

        // Stream the response token by token
        print!("Assistant: ");
        io::stdout().flush()?;
        let mut full_response = String::new();
        let mut stream = client.complete_streaming_chat(&messages, None).await?;
        while let Some(chunk) = stream.next().await {
            let chunk = chunk?;
            if let Some(choice) = chunk.choices.first() {
                if let Some(ref content) = choice.delta.content {
                    print!("{content}");
                    io::stdout().flush()?;
                    full_response.push_str(content);
                }
            }
        }
        println!("\n");

        // Add the complete response to conversation history
        let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
            serde_json::json!({"role": "assistant", "content": full_response}),
        )?;
        messages.push(assistant_msg);
    }

    // Clean up - unload the model
    model.unload().await?;
    println!("Model unloaded. Goodbye!");

    Ok(())
}

运行聊天助理:

cargo run

会看到类似于以下内容的输出:

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要?” 时, 它知道你仍在谈论光合作用。

清理资源

卸载模型后,模型权重将保留在本地缓存中。 这意味着下次运行应用程序时,将跳过下载步骤,模型加载速度更快。 除非需要回收磁盘空间,否则不需要进行额外的清理。