Tutorial: Constrói um assistente de chat com vários turnos com a Foundry Local

Neste tutorial, constróis um assistente de chat interativo que funciona inteiramente no teu dispositivo. O assistente mantém o contexto da conversa através de várias trocas, por isso lembra-se do que discutiu anteriormente na conversa. Você utiliza o Foundry Local SDK para selecionar um modelo, definir um prompt do sistema e transmitir respostas um token de cada vez.

Neste tutorial, aprenderás como:

Configurar um projeto e instalar o SDK Local da Foundry
Navegue pelo catálogo de modelos e selecione um modelo
Defina um prompt do sistema para moldar o comportamento do assistente
Implementar uma conversa multi-turno com histórico de mensagens
Transmita respostas para uma experiência responsiva
Limpe os recursos quando a conversa acabar

Pré-requisitos

Um computador Windows, macOS ou Linux com pelo menos 8 GB de RAM.

Repositório de samples

O código de exemplo completo deste artigo está disponível no repositório Foundry Local GitHub. Para clonar o repositório e navegar até ao exemplo de uso:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/cs/tutorial-chat-assistant

Instalar pacotes

Se estiver a desenvolver ou a enviar no Windows, selecione o separador Windows. O pacote Windows integra-se com o runtime Windows ML — fornece a mesma área de superfície da API com uma maior amplitude de aceleração por hardware.

Windows
Multiplataforma

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

dotnet add package Microsoft.AI.Foundry.Local
dotnet add package OpenAI

Os exemplos de C# no repositório GitHub são projetos pré-configurados. Se está a construir do zero, deve ler a referência do Foundry Local SDK para mais detalhes sobre como configurar o seu projeto C# com o Foundry Local.

Navegue pelo catálogo e selecione um modelo

O Foundry Local SDK fornece um catálogo de modelos que lista todos os modelos disponíveis. Neste passo, inicializa o SDK e seleciona um modelo para o seu assistente de chat.

Abra Program.cs e substitua o seu conteúdo pelo seguinte código para inicializar o SDK e selecionar um modelo:

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

O GetModelAsync método aceita um alias de modelo, que é um nome curto e amigável que corresponde a um modelo específico no catálogo. O DownloadAsync método recolhe os pesos do modelo para a cache local e LoadAsync torna o modelo pronto para inferência.

Defina um prompt do sistema

Um prompt do sistema define a personalidade e o comportamento do assistente. É a primeira mensagem no histórico da conversa e o modelo faz referência a ela ao longo da conversa.

Adicione um prompt do sistema para moldar como o assistente responde:

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

Sugestão

Experimente diferentes prompts do sistema para alterar o comportamento do assistente. Por exemplo, pode instruí-lo a responder como pirata, professor ou especialista em área.

Implementar conversa em vários turnos

Um assistente de chat precisa de manter o contexto em várias trocas. Consegue-se isto mantendo uma lista de todas as mensagens (sistema, utilizador e assistente) e enviando a lista completa com cada pedido. O modelo utiliza esta história para gerar respostas contextualmente relevantes.

Adicione um ciclo de conversa que:

Lê a entrada do utilizador da consola.
Acrescenta a mensagem do utilizador ao histórico.
Envia o histórico completo para o modelo.
Acrescenta a resposta do assistente ao histórico para o próximo turno.

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

Cada chamada para CompleteChatAsync recebe o histórico completo de mensagens. É assim que o modelo "se lembra" dos turnos anteriores — não armazena o estado entre chamadas.

Adicionar respostas em streaming

O processo de streaming imprime cada token à medida que é gerado, o que torna o assistente mais dinâmico. Substitua a chamada CompleteChatAsync por CompleteChatStreamingAsync para transmitir a resposta token por token.

Atualize o ciclo de conversa para usar streaming:

// Stream the response token by token
Console.Write("Assistant: ");
var fullResponse = string.Empty;
var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    var content = chunk.Choices[0].Message.Content;
    if (!string.IsNullOrEmpty(content))
    {
        Console.Write(content);
        Console.Out.Flush();
        fullResponse += content;
    }
}
Console.WriteLine("\n");

A versão em streaming acumula a resposta completa para que possa ser adicionada ao histórico de conversas após a transmissão terminar.

Código completo

Substitua o conteúdo de Program.cs pelo seguinte código completo:

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

Console.WriteLine("\nChat assistant ready! Type 'quit' to exit.\n");

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

// Clean up - unload the model
await model.UnloadAsync();
Console.WriteLine("Model unloaded. Goodbye!");

Executa o assistente de chat:

dotnet run

Vê uma saída semelhante a:

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

Repare como o assistente se lembra do contexto dos turnos anteriores — quando pergunta "Porque é importante para outros seres vivos?", sabe que ainda está a falar de fotossíntese.

Repositório de samples

O código de exemplo completo deste artigo está disponível no repositório Foundry Local GitHub. Para clonar o repositório e navegar até ao exemplo de uso:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/js/tutorial-chat-assistant

npm install foundry-local-sdk-winml openai

npm install foundry-local-sdk openai

Navegue pelo catálogo e selecione um modelo

O Foundry Local SDK fornece um catálogo de modelos que lista todos os modelos disponíveis. Neste passo, inicializa o SDK e seleciona um modelo para o seu assistente de chat.

Crie um ficheiro chamado index.js.

Adicione o seguinte código para inicializar o SDK e selecione um modelo:

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

O getModel método aceita um alias de modelo, que é um nome curto e amigável que corresponde a um modelo específico no catálogo. O download método recolhe os pesos do modelo para a cache local e load torna o modelo pronto para inferência.

Defina um prompt do sistema

Um prompt do sistema define a personalidade e o comportamento do assistente. É a primeira mensagem no histórico da conversa e o modelo faz referência a ela ao longo da conversa.

Adicione um prompt do sistema para moldar como o assistente responde:

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

Sugestão

Experimente diferentes prompts do sistema para alterar o comportamento do assistente. Por exemplo, pode instruí-lo a responder como pirata, professor ou especialista em área.

Implementar conversa em vários turnos

Adicione um ciclo de conversa que:

Lê a entrada do utilizador da consola.
Acrescenta a mensagem do utilizador ao histórico.
Envia o histórico completo para o modelo.
Acrescenta a resposta do assistente ao histórico para o próximo turno.

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

Cada chamada para completeChat recebe o histórico completo de mensagens. É assim que o modelo "se lembra" dos turnos anteriores — não armazena o estado entre chamadas.

Adicionar respostas em streaming

O processo de streaming imprime cada token à medida que é gerado, o que torna o assistente mais dinâmico. Substitua a chamada completeChat por completeStreamingChat para transmitir a resposta token por token.

Atualize o ciclo de conversa para usar streaming:

// Stream the response token by token
process.stdout.write('Assistant: ');
let fullResponse = '';
for await (const chunk of chatClient.completeStreamingChat(messages)) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
        fullResponse += content;
    }
}
console.log('\n');

A versão em streaming acumula a resposta completa para que possa ser adicionada ao histórico de conversas após a transmissão terminar.

Código completo

Crie um ficheiro com nome index.js e adicione o seguinte código completo:

import { FoundryLocalManager } from 'foundry-local-sdk';
import * as readline from 'readline';

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

// Set up readline for console input
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

const askQuestion = (prompt) => new Promise((resolve) => rl.question(prompt, resolve));

console.log('\nChat assistant ready! Type \'quit\' to exit.\n');

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

// Clean up - unload the model
await model.unload();
console.log('Model unloaded. Goodbye!');
rl.close();

Executa o assistente de chat:

node index.js

Vê uma saída semelhante a:

Downloading model: 100.00%
Model downloaded.
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

Repare como o assistente se lembra do contexto dos turnos anteriores — quando pergunta "Porque é importante para outros seres vivos?", sabe que ainda está a falar de fotossíntese.

Repositório de samples

O código de exemplo completo deste artigo está disponível no repositório Foundry Local GitHub. Para clonar o repositório e navegar até ao exemplo de uso:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/python/tutorial-chat-assistant

Instalar pacotes

Windows
Multiplataforma

pip install foundry-local-sdk-winml openai

pip install foundry-local-sdk openai

Navegue pelo catálogo e selecione um modelo

O Foundry Local SDK fornece um catálogo de modelos que lista todos os modelos disponíveis. Neste passo, inicializa o SDK e seleciona um modelo para o seu assistente de chat.

Crie um ficheiro chamado main.py.

Adicione o seguinte código para inicializar o SDK e selecione um modelo:

# Initialize the Foundry Local SDK
config = Configuration(app_name="foundry_local_samples")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# Download and register all execution providers.
current_ep = ""
def ep_progress(ep_name: str, percent: float):
    nonlocal current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

manager.download_and_register_eps(progress_callback=ep_progress)
if current_ep:
    print()

# Select and load a model from the catalog
model = manager.catalog.get_model("qwen2.5-0.5b")
model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
print()
model.load()
print("Model loaded and ready.")

# Get a chat client
client = model.get_chat_client()

O get_model método aceita um alias de modelo, que é um nome curto e amigável que corresponde a um modelo específico no catálogo. O download método recolhe os pesos do modelo para a cache local e load torna o modelo pronto para inferência.

Defina um prompt do sistema

Um prompt do sistema define a personalidade e o comportamento do assistente. É a primeira mensagem no histórico da conversa e o modelo faz referência a ela ao longo da conversa.

Adicione um prompt do sistema para moldar como o assistente responde:

# Start the conversation with a system prompt
messages = [
    {
        "role": "system",
        "content": "You are a helpful, friendly assistant. Keep your responses "
                   "concise and conversational. If you don't know something, say so."
    }
]

Sugestão

Experimente diferentes prompts do sistema para alterar o comportamento do assistente. Por exemplo, pode instruí-lo a responder como pirata, professor ou especialista em área.

Implementar conversa em vários turnos

Adicione um ciclo de conversa que:

Lê a entrada do utilizador da consola.
Acrescenta a mensagem do utilizador ao histórico.
Envia o histórico completo para o modelo.
Acrescenta a resposta do assistente ao histórico para o próximo turno.

while True:
    user_input = input("You: ")
    if user_input.strip().lower() in ("quit", "exit"):
        break

    # Add the user's message to conversation history
    messages.append({"role": "user", "content": user_input})

    # Stream the response token by token
    print("Assistant: ", end="", flush=True)
    full_response = ""
    for chunk in client.complete_streaming_chat(messages):
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
            full_response += content
    print("\n")

    # Add the complete response to conversation history
    messages.append({"role": "assistant", "content": full_response})

Cada chamada para complete_chat recebe o histórico completo de mensagens. É assim que o modelo "se lembra" dos turnos anteriores — não armazena o estado entre chamadas.

Adicionar respostas em streaming

O processo de streaming imprime cada token à medida que é gerado, o que torna o assistente mais dinâmico. Substitua a chamada complete_chat por complete_streaming_chat para transmitir a resposta token por token.

Atualize o ciclo de conversa para usar streaming:

# Stream the response token by token
print("Assistant: ", end="", flush=True)
full_response = ""
for chunk in client.complete_streaming_chat(messages):
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content
print("\n")

A versão em streaming acumula a resposta completa para que possa ser adicionada ao histórico de conversas após a transmissão terminar.

Código completo

Crie um ficheiro com nome main.py e adicione o seguinte código completo:

from foundry_local_sdk import Configuration, FoundryLocalManager


def main():
    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    # Download and register all execution providers.
    current_ep = ""
    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()

    # Select and load a model from the catalog
    model = manager.catalog.get_model("qwen2.5-0.5b")
    model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
    print()
    model.load()
    print("Model loaded and ready.")

    # Get a chat client
    client = model.get_chat_client()

    # Start the conversation with a system prompt
    messages = [
        {
            "role": "system",
            "content": "You are a helpful, friendly assistant. Keep your responses "
                       "concise and conversational. If you don't know something, say so."
        }
    ]

    print("\nChat assistant ready! Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ")
        if user_input.strip().lower() in ("quit", "exit"):
            break

        # Add the user's message to conversation history
        messages.append({"role": "user", "content": user_input})

        # Stream the response token by token
        print("Assistant: ", end="", flush=True)
        full_response = ""
        for chunk in client.complete_streaming_chat(messages):
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        print("\n")

        # Add the complete response to conversation history
        messages.append({"role": "assistant", "content": full_response})

    # Clean up - unload the model
    model.unload()
    print("Model unloaded. Goodbye!")


if __name__ == "__main__":
    main()

Executa o assistente de chat:

python main.py

Vê uma saída semelhante a:

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

Repare como o assistente se lembra do contexto dos turnos anteriores — quando pergunta "Porque é importante para outros seres vivos?", sabe que ainda está a falar de fotossíntese.

Repositório de samples

O código de exemplo completo deste artigo está disponível no repositório Foundry Local GitHub. Para clonar o repositório e navegar até ao exemplo de uso:

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/rust/tutorial-chat-assistant

Instalar pacotes

Windows
Multiplataforma

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

cargo add foundry-local-sdk
cargo add tokio --features full
cargo add tokio-stream anyhow

Navegue pelo catálogo e selecione um modelo

O Foundry Local SDK fornece um catálogo de modelos que lista todos os modelos disponíveis. Neste passo, inicializa o SDK e seleciona um modelo para o seu assistente de chat.

Abra src/main.rs e substitua o seu conteúdo pelo seguinte código para inicializar o SDK e selecionar um modelo:

// Initialize the Foundry Local SDK
let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

// Download and register all execution providers.
manager
    .download_and_register_eps_with_progress(None, {
        let mut current_ep = String::new();
        move |ep_name: &str, percent: f64| {
            if ep_name != current_ep {
                if !current_ep.is_empty() {
                    println!();
                }
                current_ep = ep_name.to_string();
            }
            print!("\r  {:<30}  {:5.1}%", ep_name, percent);
            io::stdout().flush().ok();
        }
    })
    .await?;
println!();

// Select and load a model from the catalog
let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

if !model.is_cached().await? {
    println!("Downloading model...");
    model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

model.load().await?;
println!("Model loaded and ready.");

// Create a chat client
let client = model.create_chat_client().temperature(0.7).max_tokens(512);

Defina um prompt do sistema

Um prompt do sistema define a personalidade e o comportamento do assistente. É a primeira mensagem no histórico da conversa e o modelo faz referência a ela ao longo da conversa.

Adicione um prompt do sistema para moldar como o assistente responde:

// Start the conversation with a system prompt
let mut messages: Vec<ChatCompletionRequestMessage> = vec![
    ChatCompletionRequestSystemMessage::from(
        "You are a helpful, friendly assistant. Keep your responses \
         concise and conversational. If you don't know something, say so.",
    )
    .into(),
];

Sugestão

Experimente diferentes prompts do sistema para alterar o comportamento do assistente. Por exemplo, pode instruí-lo a responder como pirata, professor ou especialista em área.

Implementar conversa em vários turnos

Um assistente de chat precisa de manter o contexto em várias trocas. Consegue-se isto mantendo um vetor de todas as mensagens (sistema, utilizador e assistente) e enviando a lista completa com cada pedido. O modelo utiliza esta história para gerar respostas contextualmente relevantes.

Adicione um ciclo de conversa que:

Lê a entrada do utilizador da consola.
Acrescenta a mensagem do utilizador ao histórico.
Envia o histórico completo para o modelo.
Acrescenta a resposta do assistente ao histórico para o próximo turno.

loop {
    print!("You: ");
    io::stdout().flush()?;

    let mut input = String::new();
    stdin.lock().read_line(&mut input)?;
    let input = input.trim();

    if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
        break;
    }

    // Add the user's message to conversation history
    messages.push(ChatCompletionRequestUserMessage::from(input).into());

    // Stream the response token by token
    print!("Assistant: ");
    io::stdout().flush()?;
    let mut full_response = String::new();
    let mut stream = client.complete_streaming_chat(&messages, None).await?;
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        if let Some(choice) = chunk.choices.first() {
            if let Some(ref content) = choice.delta.content {
                print!("{content}");
                io::stdout().flush()?;
                full_response.push_str(content);
            }
        }
    }
    println!("\n");

    // Add the complete response to conversation history
    let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
        serde_json::json!({"role": "assistant", "content": full_response}),
    )?;
    messages.push(assistant_msg);
}

Cada chamada para complete_chat recebe o histórico completo de mensagens. É assim que o modelo "se lembra" dos turnos anteriores — não armazena o estado entre chamadas.

Adicionar respostas em streaming

Atualize o ciclo de conversa para usar streaming:

// Stream the response token by token
print!("Assistant: ");
io::stdout().flush()?;
let mut full_response = String::new();
let mut stream = client.complete_streaming_chat(&messages, None).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(choice) = chunk.choices.first() {
        if let Some(ref content) = choice.delta.content {
            print!("{content}");
            io::stdout().flush()?;
            full_response.push_str(content);
        }
    }
}
println!("\n");

A versão em streaming acumula a resposta completa para que possa ser adicionada ao histórico de conversas após a transmissão terminar.

Código completo

Substitua o conteúdo de src/main.rs pelo seguinte código completo:

use foundry_local_sdk::{
    ChatCompletionRequestMessage,
    ChatCompletionRequestSystemMessage, ChatCompletionRequestUserMessage,
    FoundryLocalConfig, FoundryLocalManager,
};
use std::io::{self, BufRead, Write};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();

    // Select and load a model from the catalog
    let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

    if !model.is_cached().await? {
        println!("Downloading model...");
        model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    model.load().await?;
    println!("Model loaded and ready.");

    // Create a chat client
    let client = model.create_chat_client().temperature(0.7).max_tokens(512);

    // Start the conversation with a system prompt
    let mut messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from(
            "You are a helpful, friendly assistant. Keep your responses \
             concise and conversational. If you don't know something, say so.",
        )
        .into(),
    ];

    println!("\nChat assistant ready! Type 'quit' to exit.\n");

    let stdin = io::stdin();
    loop {
        print!("You: ");
        io::stdout().flush()?;

        let mut input = String::new();
        stdin.lock().read_line(&mut input)?;
        let input = input.trim();

        if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
            break;
        }

        // Add the user's message to conversation history
        messages.push(ChatCompletionRequestUserMessage::from(input).into());

        // Stream the response token by token
        print!("Assistant: ");
        io::stdout().flush()?;
        let mut full_response = String::new();
        let mut stream = client.complete_streaming_chat(&messages, None).await?;
        while let Some(chunk) = stream.next().await {
            let chunk = chunk?;
            if let Some(choice) = chunk.choices.first() {
                if let Some(ref content) = choice.delta.content {
                    print!("{content}");
                    io::stdout().flush()?;
                    full_response.push_str(content);
                }
            }
        }
        println!("\n");

        // Add the complete response to conversation history
        let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
            serde_json::json!({"role": "assistant", "content": full_response}),
        )?;
        messages.push(assistant_msg);
    }

    // Clean up - unload the model
    model.unload().await?;
    println!("Model unloaded. Goodbye!");

    Ok(())
}

Executa o assistente de chat:

cargo run

Vê uma saída semelhante a:

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

Repare como o assistente se lembra do contexto dos turnos anteriores — quando pergunta "Porque é importante para outros seres vivos?", sabe que ainda está a falar de fotossíntese.

Limpeza de recursos

Os pesos dos modelos permanecem na sua cache local depois de descarregar um modelo. Isto significa que, da próxima vez que executares a aplicação, o passo de download é ignorado e o modelo carrega mais rápido. Não é necessário fazer limpeza extra a menos que queiras recuperar espaço no disco.

Comentários

Esta página foi útil?

Last updated on 2026-04-09

Partilhar via

Tutorial: Constrói um assistente de chat com vários turnos com a Foundry Local

Pré-requisitos

Repositório de samples

Instalar pacotes

Navegue pelo catálogo e selecione um modelo

Defina um prompt do sistema

Implementar conversa em vários turnos

Adicionar respostas em streaming

Código completo

Repositório de samples

Instalar pacotes

Navegue pelo catálogo e selecione um modelo

Defina um prompt do sistema

Implementar conversa em vários turnos

Adicionar respostas em streaming

Código completo

Repositório de samples

Instalar pacotes

Navegue pelo catálogo e selecione um modelo

Defina um prompt do sistema

Implementar conversa em vários turnos

Adicionar respostas em streaming

Código completo

Repositório de samples

Instalar pacotes

Navegue pelo catálogo e selecione um modelo

Defina um prompt do sistema

Implementar conversa em vários turnos

Adicionar respostas em streaming

Código completo

Limpeza de recursos

Conteúdo relacionado

Comentários

Recursos adicionais