Como transmitir respostas do agente

2025-05-23

O que é uma resposta transmitida em fluxo?

Uma resposta transmitida entrega o conteúdo da mensagem em partes pequenas e incrementais. Essa abordagem aprimora a experiência do usuário, permitindo que ele visualize e interaja com a mensagem à medida que ela se desenrola, em vez de esperar que toda a resposta seja carregada. Os usuários podem começar a processar informações imediatamente, melhorando a sensação de capacidade de resposta e interatividade. Como resultado, minimiza atrasos e mantém os usuários mais engajados durante todo o processo de comunicação.

Referências de streaming

Streaming no Kernel Semântico

Os Serviços de IA que dão suporte ao streaming no Kernel Semântico usam tipos de conteúdo diferentes em comparação com aqueles usados para mensagens totalmente formadas. Esses tipos de conteúdo são projetados especificamente para lidar com a natureza incremental dos dados de streaming. Os mesmos tipos de conteúdo também são utilizados no Agent Framework para fins semelhantes. Isso garante consistência e eficiência em ambos os sistemas ao lidar com informações de streaming.

Dica

Referência de API:

Dica

Referência de API:

Recurso atualmente indisponível em Java.

Resposta transmitida de `ChatCompletionAgent`

Ao invocar uma resposta transmitida de um ChatCompletionAgent, a ChatHistory no AgentThread é atualizada depois que a resposta completa é recebida. Embora a resposta seja transmitida de forma incremental, o histórico registra apenas a mensagem completa. Isso garante que a ChatHistory reflita respostas completamente formadas para consistência.

// Define agent
ChatCompletionAgent agent = ...;

ChatHistoryAgentThread agentThread = new();

// Create a user message
var message = ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's also possible to read the messages that were added to the ChatHistoryAgentThread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread

# Define agent
agent = ChatCompletionAgent(...)

# Create a thread object to maintain the conversation state.
# If no thread is provided one will be created and returned with
# the initial response.
thread: ChatHistoryAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread)
{
  # Process streamed response(s)...
  thread = response.thread
}

Recurso atualmente indisponível em Java.

Resposta transmitida de `OpenAIAssistantAgent`

Ao invocar uma resposta transmitida de um OpenAIAssistantAgent, o assistente mantém o estado da conversa como um thread remoto. É possível ler as mensagens do thread remoto, se necessário.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient);

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

Para criar um thread usando um existente Id, passe-o para o construtor de OpenAIAssistantAgentThread:

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient, "your-existing-thread-id");

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread: AssistantAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Read the messages from the remote thread
async for response in thread.get_messages():
  # Process messages

# Delete the thread
await thread.delete()

Para criar um thread usando um existente thread_id, passe-o para o construtor de AssistantAgentThread:

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread = AssistantAgentThread(client=client, thread_id="your-existing-thread-id")

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Delete the thread
await thread.delete()

Recurso atualmente indisponível em Java.

Manipulando mensagens intermediárias com uma resposta de streaming

A natureza das respostas de streaming permite que os modelos LLM retornem partes incrementais de texto, possibilitando uma renderização mais rápida em uma interface do usuário ou console sem esperar que toda a resposta seja concluída. Além disso, um chamador pode querer lidar com conteúdo intermediário, como resultados de chamadas de funções. Isso pode ser obtido fornecendo uma função de retorno de chamada ao invocar a resposta de streaming. A função de retorno de chamada recebe mensagens completas encapsuladas em ChatMessageContent.

A documentação de callback para o AzureAIAgent estará disponível em breve.

Configurar o on_intermediate_message retorno de chamada dentro de agent.invoke_stream(...) permite que o chamador receba mensagens intermediárias geradas durante o processo de formulação da resposta final do agente.

import asyncio
from typing import Annotated

from semantic_kernel.agents import AzureResponsesAgent
from semantic_kernel.contents import ChatMessageContent, FunctionCallContent, FunctionResultContent
from semantic_kernel.functions import kernel_function


# Define a sample plugin for the sample
class MenuPlugin:
    """A sample Menu Plugin used for the concept sample."""

    @kernel_function(description="Provides a list of specials from the menu.")
    def get_specials(self, menu_item: str) -> Annotated[str, "Returns the specials from the menu."]:
        return """
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        """

    @kernel_function(description="Provides the price of the requested menu item.")
    def get_item_price(
        self, menu_item: Annotated[str, "The name of the menu item."]
    ) -> Annotated[str, "Returns the price of the menu item."]:
        return "$9.99"

# This callback function will be called for each intermediate message,
# which will allow one to handle FunctionCallContent and FunctionResultContent.
# If the callback is not provided, the agent will return the final response
# with no intermediate tool call steps.
async def handle_streaming_intermediate_steps(message: ChatMessageContent) -> None:
    for item in message.items or []:
        if isinstance(item, FunctionResultContent):
            print(f"Function Result:> {item.result} for function: {item.name}")
        elif isinstance(item, FunctionCallContent):
            print(f"Function Call:> {item.name} with arguments: {item.arguments}")
        else:
            print(f"{item}")

# Simulate a conversation with the agent
USER_INPUTS = [
    "Hello",
    "What is the special soup?",
    "What is the special drink?",
    "How much is it?",
    "Thank you",
]


async def main():
    # 1. Create the client using OpenAI resources and configuration
    client, model = AzureResponsesAgent.setup_resources()

    # 2. Create a Semantic Kernel agent for the OpenAI Responses API
    agent = AzureResponsesAgent(
        ai_model_id=model,
        client=client,
        instructions="Answer questions about the menu.",
        name="Host",
        plugins=[MenuPlugin()],
    )

    # 3. Create a thread for the agent
    # If no thread is provided, a new thread will be
    # created and returned with the initial response
    thread = None

    try:
        for user_input in user_inputs:
            print(f"# {AuthorRole.USER}: '{user_input}'")

            first_chunk = True
            async for response in agent.invoke_stream(
                messages=user_input,
                thread=thread,
                on_intermediate_message=handle_streaming_intermediate_steps,
            ):
                thread = response.thread
                if first_chunk:
                    print(f"# {response.name}: ", end="", flush=True)
                    first_chunk = False
                print(response.content, end="", flush=True)
            print()
    finally:
        await thread.delete() if thread else None

if __name__ == "__main__":
    asyncio.run(main())

O seguinte demonstra a saída de exemplo do processo de invocação do agente:

Sample Output:

# AuthorRole.USER: 'Hello'
# Host: Hello! How can I assist you with the menu today?
# AuthorRole.USER: 'What is the special soup?'
Function Call:> MenuPlugin-get_specials with arguments: {}
Function Result:>
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        for function: MenuPlugin-get_specials
# Host: The special soup today is Clam Chowder. Would you like to know more about it or hear about other specials?
# AuthorRole.USER: 'What is the special drink?'
# Host: The special drink today is Chai Tea. Would you like more details or are you interested in ordering it?
# AuthorRole.USER: 'How much is that?'
Function Call:> MenuPlugin-get_item_price with arguments: {"menu_item":"Chai Tea"}
Function Result:> $9.99 for function: MenuPlugin-get_item_price
# Host: The special drink, Chai Tea, is $9.99. Would you like to order one or need information on something else?
# AuthorRole.USER: 'Thank you'
# Host: You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day!

Recurso atualmente indisponível em Java.

Próximas etapas

Usando modelos com agentes

Orquestração de agentes

Compartilhar via

Como transmitir respostas do agente

O que é uma resposta transmitida em fluxo?

Referências de streaming

Streaming no Kernel Semântico

Resposta transmitida de ChatCompletionAgent

Resposta transmitida de OpenAIAssistantAgent

Manipulando mensagens intermediárias com uma resposta de streaming

Próximas etapas

Recursos adicionais

Resposta transmitida de `ChatCompletionAgent`

Resposta transmitida de `OpenAIAssistantAgent`