你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

教程：使用 Foundry Local 生成多轮聊天助手

在本教程中，你将生成一个完全在设备上运行的交互式聊天助手。助手在多个交换中维护对话上下文，因此它会记住你在对话前面讨论的内容。可以使用 Foundry 本地 SDK 选择模型，定义系统提示逐令牌流式传输响应。

本教程中，您将学习如何：

设置项目并安装 Foundry 本地 SDK
浏览模型目录并选择模型
定义用于塑造助理行为的系统提示
使用消息历史记录实现多轮次对话
流式传输响应以提升响应体验
在会话结束时清理资源

先决条件

至少具有 8 GB RAM 的 Windows、macOS 或 Linux 计算机。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。若要克隆存储库并导航到示例，请使用：

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/cs/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署，请选择“Windows”选项卡。Windows包与Windows ML运行时集成，提供相同的API接口，并支持更广泛的硬件加速范围。

Windows
跨平台

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

dotnet add package Microsoft.AI.Foundry.Local
dotnet add package OpenAI

GitHub存储库中的 C# 示例是预配置项目。如果要从头开始生成，则应阅读 Foundry Local SDK 参考，详细了解如何使用 Foundry Local 设置 C# 项目。

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录，其中列出了所有可用的模型。在此步骤中，将初始化 SDK 并为聊天助手选择模型。

打开 Program.cs 并将其内容替换为以下代码来初始化 SDK 并选择模型：

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

该方法 GetModelAsync 接受模型别名，该别名是映射到目录中特定模型的短友好名称。该方法 DownloadAsync 将模型权重提取到本地缓存，并使 LoadAsync 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。它是对话历史记录中的第一条消息，模型在整个对话中引用它。

添加系统提示以调整助手的响应方式：

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

小窍门

尝试使用不同的系统提示来更改助手的行为。例如，可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个对话维护上下文。通过保留所有消息的列表（系统、用户和助理），并发送包含每个请求的完整列表来实现此目的。模型使用此历史记录生成上下文相关的响应。

添加一个对话循环，该循环：

从控制台读取用户输入。
将用户消息追加到历史记录。
将完整历史记录发送到模型。
将助手的响应追加到下一轮对话历史中。

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

每次调用CompleteChatAsync 都会接收完整的消息历史记录。这就是模型“记住”之前轮次的方式，它不会在调用之间存储状态。

添加流响应

流式传输在每个令牌生成时打印出来，这使得助手更具响应性。将 CompleteChatAsync 调用替换为 CompleteChatStreamingAsync 以逐个令牌流式传输响应。

更新会话循环以使用流式处理：

// Stream the response token by token
Console.Write("Assistant: ");
var fullResponse = string.Empty;
var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    var content = chunk.Choices[0].Message.Content;
    if (!string.IsNullOrEmpty(content))
    {
        Console.Write(content);
        Console.Out.Flush();
        fullResponse += content;
    }
}
Console.WriteLine("\n");

流版本会累积完整响应，以便在流完成后将其添加到聊天历史记录中。

完整代码

将 Program.cs 的内容替换为完整的代码：

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

Console.WriteLine("\nChat assistant ready! Type 'quit' to exit.\n");

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

// Clean up - unload the model
await model.UnloadAsync();
Console.WriteLine("Model unloaded. Goodbye!");

运行聊天助理：

dotnet run

会看到类似于以下内容的输出：

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要？” 时，它知道你仍在谈论光合作用。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。若要克隆存储库并导航到示例，请使用：

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/js/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署，请选择“Windows”选项卡。Windows包与Windows ML运行时集成，提供相同的API接口，并支持更广泛的硬件加速范围。

Windows
跨平台

npm install foundry-local-sdk-winml openai

npm install foundry-local-sdk openai

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录，其中列出了所有可用的模型。在此步骤中，将初始化 SDK 并为聊天助手选择模型。

创建名为的文件 index.js。

添加以下代码以初始化 SDK 并选择模型：

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

该方法 getModel 接受模型别名，该别名是映射到目录中特定模型的短友好名称。该方法 download 将模型权重提取到本地缓存，并使 load 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。它是对话历史记录中的第一条消息，模型在整个对话中引用它。

添加系统提示以调整助手的响应方式：

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

小窍门

尝试使用不同的系统提示来更改助手的行为。例如，可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

添加一个对话循环，该循环：

从控制台读取用户输入。
将用户消息追加到历史记录。
将完整历史记录发送到模型。
将助手的响应追加到下一轮对话历史中。

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

每次调用completeChat 都会接收完整的消息历史记录。这就是模型“记住”之前轮次的方式，它不会在调用之间存储状态。

添加流响应

流式传输在每个令牌生成时打印出来，这使得助手更具响应性。将 completeChat 调用替换为 completeStreamingChat 以逐个令牌流式传输响应。

更新会话循环以使用流式处理：

// Stream the response token by token
process.stdout.write('Assistant: ');
let fullResponse = '';
for await (const chunk of chatClient.completeStreamingChat(messages)) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
        fullResponse += content;
    }
}
console.log('\n');

流版本会累积完整响应，以便在流完成后将其添加到聊天历史记录中。

完整代码

创建一个名为 index.js 并添加以下完整代码的文件：

import { FoundryLocalManager } from 'foundry-local-sdk';
import * as readline from 'readline';

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

// Set up readline for console input
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

const askQuestion = (prompt) => new Promise((resolve) => rl.question(prompt, resolve));

console.log('\nChat assistant ready! Type \'quit\' to exit.\n');

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

// Clean up - unload the model
await model.unload();
console.log('Model unloaded. Goodbye!');
rl.close();

运行聊天助理：

node index.js

会看到类似于以下内容的输出：

Downloading model: 100.00%
Model downloaded.
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要？” 时，它知道你仍在谈论光合作用。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。若要克隆存储库并导航到示例，请使用：

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/python/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署，请选择“Windows”选项卡。Windows包与Windows ML运行时集成，提供相同的API接口，并支持更广泛的硬件加速范围。

Windows
跨平台

pip install foundry-local-sdk-winml openai

pip install foundry-local-sdk openai

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录，其中列出了所有可用的模型。在此步骤中，将初始化 SDK 并为聊天助手选择模型。

创建名为的文件 main.py。

添加以下代码以初始化 SDK 并选择模型：

# Initialize the Foundry Local SDK
config = Configuration(app_name="foundry_local_samples")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# Download and register all execution providers.
current_ep = ""
def ep_progress(ep_name: str, percent: float):
    nonlocal current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

manager.download_and_register_eps(progress_callback=ep_progress)
if current_ep:
    print()

# Select and load a model from the catalog
model = manager.catalog.get_model("qwen2.5-0.5b")
model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
print()
model.load()
print("Model loaded and ready.")

# Get a chat client
client = model.get_chat_client()

该方法 get_model 接受模型别名，该别名是映射到目录中特定模型的短友好名称。该方法 download 将模型权重提取到本地缓存，并使 load 模型准备好进行推理。

定义系统提示

系统提示设置助手的个性和行为。它是对话历史记录中的第一条消息，模型在整个对话中引用它。

添加系统提示以调整助手的响应方式：

# Start the conversation with a system prompt
messages = [
    {
        "role": "system",
        "content": "You are a helpful, friendly assistant. Keep your responses "
                   "concise and conversational. If you don't know something, say so."
    }
]

小窍门

尝试使用不同的系统提示来更改助手的行为。例如，可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个交换维护上下文。通过保留所有消息的列表（系统、用户和助理），并发送包含每个请求的完整列表来实现此目的。模型使用此历史记录生成上下文相关的响应。

添加一个对话循环，该循环：

从控制台读取用户输入。
将用户消息追加到历史记录。
将完整历史记录发送到模型。
将助手的响应追加到下一轮历史记录。

while True:
    user_input = input("You: ")
    if user_input.strip().lower() in ("quit", "exit"):
        break

    # Add the user's message to conversation history
    messages.append({"role": "user", "content": user_input})

    # Stream the response token by token
    print("Assistant: ", end="", flush=True)
    full_response = ""
    for chunk in client.complete_streaming_chat(messages):
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
            full_response += content
    print("\n")

    # Add the complete response to conversation history
    messages.append({"role": "assistant", "content": full_response})

每次调用 complete_chat 都会接收到完整的消息历史记录。这就是模型“记住”之前轮次的方式，它不会在调用之间存储状态。

添加流响应

流式处理会在生成令牌的过程中打印每个令牌，这使得助手的反应更迅速。将 complete_chat 调用替换为 complete_streaming_chat 按令牌流式传输响应令牌。

更新会话循环以使用流式传输：

# Stream the response token by token
print("Assistant: ", end="", flush=True)
full_response = ""
for chunk in client.complete_streaming_chat(messages):
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content
print("\n")

流版本会累积完整响应，以便在流完成后将其添加到聊天历史记录中。

完整代码

创建一个名为 main.py 并添加以下完整代码的文件：

from foundry_local_sdk import Configuration, FoundryLocalManager


def main():
    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    # Download and register all execution providers.
    current_ep = ""
    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()

    # Select and load a model from the catalog
    model = manager.catalog.get_model("qwen2.5-0.5b")
    model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
    print()
    model.load()
    print("Model loaded and ready.")

    # Get a chat client
    client = model.get_chat_client()

    # Start the conversation with a system prompt
    messages = [
        {
            "role": "system",
            "content": "You are a helpful, friendly assistant. Keep your responses "
                       "concise and conversational. If you don't know something, say so."
        }
    ]

    print("\nChat assistant ready! Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ")
        if user_input.strip().lower() in ("quit", "exit"):
            break

        # Add the user's message to conversation history
        messages.append({"role": "user", "content": user_input})

        # Stream the response token by token
        print("Assistant: ", end="", flush=True)
        full_response = ""
        for chunk in client.complete_streaming_chat(messages):
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        print("\n")

        # Add the complete response to conversation history
        messages.append({"role": "assistant", "content": full_response})

    # Clean up - unload the model
    model.unload()
    print("Model unloaded. Goodbye!")


if __name__ == "__main__":
    main()

运行聊天助理：

python main.py

会看到类似于以下内容的输出：

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要？” 时，它知道你仍在谈论光合作用。

示例存储库

本文的完整示例代码在 Foundry Local GitHub 存储库中提供。若要克隆存储库并导航到示例，请使用：

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/rust/tutorial-chat-assistant

安装软件包

如果您正在Windows上开发或部署，请选择“Windows”选项卡。Windows包与Windows ML运行时集成，提供相同的API接口，并支持更广泛的硬件加速范围。

Windows
跨平台

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

cargo add foundry-local-sdk
cargo add tokio --features full
cargo add tokio-stream anyhow

浏览目录并选择模型

Foundry Local SDK 提供了一个模型目录，其中列出了所有可用的模型。在此步骤中，将初始化 SDK 并为聊天助手选择模型。

打开 src/main.rs 并将其内容替换为以下代码来初始化 SDK 并选择模型：

// Initialize the Foundry Local SDK
let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

// Download and register all execution providers.
manager
    .download_and_register_eps_with_progress(None, {
        let mut current_ep = String::new();
        move |ep_name: &str, percent: f64| {
            if ep_name != current_ep {
                if !current_ep.is_empty() {
                    println!();
                }
                current_ep = ep_name.to_string();
            }
            print!("\r  {:<30}  {:5.1}%", ep_name, percent);
            io::stdout().flush().ok();
        }
    })
    .await?;
println!();

// Select and load a model from the catalog
let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

if !model.is_cached().await? {
    println!("Downloading model...");
    model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

model.load().await?;
println!("Model loaded and ready.");

// Create a chat client
let client = model.create_chat_client().temperature(0.7).max_tokens(512);

定义系统提示

系统提示设置助手的个性和行为。它是对话历史记录中的第一条消息，模型在整个对话中引用它。

添加系统提示以调整助手的响应方式：

// Start the conversation with a system prompt
let mut messages: Vec<ChatCompletionRequestMessage> = vec![
    ChatCompletionRequestSystemMessage::from(
        "You are a helpful, friendly assistant. Keep your responses \
         concise and conversational. If you don't know something, say so.",
    )
    .into(),
];

小窍门

尝试使用不同的系统提示来更改助手的行为。例如，可以指示它作为海盗、教师或域专家做出响应。

实现多回合对话

聊天助理需要跨多个交换维护上下文。通过保留所有消息（系统、用户和助理）的向量并发送包含每个请求的完整列表来实现此目的。模型使用此历史记录生成上下文相关的响应。

添加一个对话循环：

从控制台读取用户输入。
将用户消息追加到历史记录。
将完整历史记录发送到模型。
将助手的响应追加到下一轮历史记录。

loop {
    print!("You: ");
    io::stdout().flush()?;

    let mut input = String::new();
    stdin.lock().read_line(&mut input)?;
    let input = input.trim();

    if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
        break;
    }

    // Add the user's message to conversation history
    messages.push(ChatCompletionRequestUserMessage::from(input).into());

    // Stream the response token by token
    print!("Assistant: ");
    io::stdout().flush()?;
    let mut full_response = String::new();
    let mut stream = client.complete_streaming_chat(&messages, None).await?;
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        if let Some(choice) = chunk.choices.first() {
            if let Some(ref content) = choice.delta.content {
                print!("{content}");
                io::stdout().flush()?;
                full_response.push_str(content);
            }
        }
    }
    println!("\n");

    // Add the complete response to conversation history
    let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
        serde_json::json!({"role": "assistant", "content": full_response}),
    )?;
    messages.push(assistant_msg);
}

每次调用以 complete_chat 接收完整的消息历史记录。这就是模型“记住”之前轮次的方式，它不会在调用之间存储状态。

添加流响应

流式传输在每个令牌生成时打印出来，这使得助手更具响应性。将 complete_chat 调用替换为 complete_streaming_chat 按令牌流式传输响应令牌。

更新会话循环以使用流式处理：

// Stream the response token by token
print!("Assistant: ");
io::stdout().flush()?;
let mut full_response = String::new();
let mut stream = client.complete_streaming_chat(&messages, None).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(choice) = chunk.choices.first() {
        if let Some(ref content) = choice.delta.content {
            print!("{content}");
            io::stdout().flush()?;
            full_response.push_str(content);
        }
    }
}
println!("\n");

流版本会累积完整响应，以便在流完成后将其添加到聊天历史记录中。

完整代码

将 src/main.rs 的内容替换为完整的代码：

use foundry_local_sdk::{
    ChatCompletionRequestMessage,
    ChatCompletionRequestSystemMessage, ChatCompletionRequestUserMessage,
    FoundryLocalConfig, FoundryLocalManager,
};
use std::io::{self, BufRead, Write};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();

    // Select and load a model from the catalog
    let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

    if !model.is_cached().await? {
        println!("Downloading model...");
        model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    model.load().await?;
    println!("Model loaded and ready.");

    // Create a chat client
    let client = model.create_chat_client().temperature(0.7).max_tokens(512);

    // Start the conversation with a system prompt
    let mut messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from(
            "You are a helpful, friendly assistant. Keep your responses \
             concise and conversational. If you don't know something, say so.",
        )
        .into(),
    ];

    println!("\nChat assistant ready! Type 'quit' to exit.\n");

    let stdin = io::stdin();
    loop {
        print!("You: ");
        io::stdout().flush()?;

        let mut input = String::new();
        stdin.lock().read_line(&mut input)?;
        let input = input.trim();

        if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
            break;
        }

        // Add the user's message to conversation history
        messages.push(ChatCompletionRequestUserMessage::from(input).into());

        // Stream the response token by token
        print!("Assistant: ");
        io::stdout().flush()?;
        let mut full_response = String::new();
        let mut stream = client.complete_streaming_chat(&messages, None).await?;
        while let Some(chunk) = stream.next().await {
            let chunk = chunk?;
            if let Some(choice) = chunk.choices.first() {
                if let Some(ref content) = choice.delta.content {
                    print!("{content}");
                    io::stdout().flush()?;
                    full_response.push_str(content);
                }
            }
        }
        println!("\n");

        // Add the complete response to conversation history
        let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
            serde_json::json!({"role": "assistant", "content": full_response}),
        )?;
        messages.push(assistant_msg);
    }

    // Clean up - unload the model
    model.unload().await?;
    println!("Model unloaded. Goodbye!");

    Ok(())
}

运行聊天助理：

cargo run

会看到类似于以下内容的输出：

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

请注意助手如何记住先前轮次的上下文 — 当你问 “为什么对其他活物很重要？” 时，它知道你仍在谈论光合作用。

清理资源

卸载模型后，模型权重将保留在本地缓存中。这意味着下次运行应用程序时，将跳过下载步骤，模型加载速度更快。除非需要回收磁盘空间，否则不需要进行额外的清理。

反馈

此页面是否有帮助？

Last updated on 2026-04-09

通过

教程：使用 Foundry Local 生成多轮聊天助手

先决条件

示例存储库

安装软件包

浏览目录并选择模型

定义系统提示

实现多回合对话

添加流响应

完整代码

示例存储库

安装软件包

浏览目录并选择模型

定义系统提示

实现多回合对话

添加流响应

完整代码

示例存储库

安装软件包

浏览目录并选择模型

定义系统提示

实现多回合对话

添加流响应

完整代码

示例存储库

安装软件包

浏览目录并选择模型

定义系统提示

实现多回合对话

添加流响应

完整代码

清理资源

相关内容

反馈

其他资源