Jak przesyłać strumieniowo odpowiedzi agenta

Co to jest odpowiedź przesyłana strumieniowo?

Odpowiedź strumieniowa dostarcza zawartość komunikatu w małych fragmentach przyrostowych. Takie podejście zwiększa doświadczenie użytkownika, umożliwiając im wyświetlanie i angażowanie się w komunikat w miarę jego pojawiania się, zamiast czekać, aż cała odpowiedź się załaduje. Użytkownicy mogą natychmiast rozpocząć przetwarzanie informacji, poprawiając poczucie reakcji i interakcyjności. W rezultacie minimalizuje opóźnienia i zwiększa zaangażowanie użytkowników w całym procesie komunikacji.

Odwołania do transmisji

Przesyłanie strumieniowe w jądrze semantycznym

Usługi sztucznej inteligencji, które obsługują przesyłanie strumieniowe w jądrze semantycznym, używają różnych typów zawartości w porównaniu z usługami używanymi do w pełni sformułowanych komunikatów. Te typy zawartości są specjalnie zaprojektowane w celu obsługi przyrostowego charakteru danych przesyłanych strumieniowo. Te same typy zawartości są również używane w strukturze agentów do podobnych celów. Zapewnia to spójność i wydajność w obu systemach podczas pracy z informacjami przesyłanymi strumieniowo.

Wskazówka

Dokumentacja interfejsu API:

Wskazówka

Dokumentacja interfejsu API:

Funkcja jest obecnie niedostępna w języku Java.

Odpowiedź przesyłana strumieniowo z `ChatCompletionAgent`

Podczas wywoływania przesyłanej strumieniowo odpowiedzi z ChatCompletionAgent, ChatHistory w AgentThread jest aktualizowany po otrzymaniu pełnej odpowiedzi. Chociaż odpowiedź jest przesyłana strumieniowo, historia rejestruje tylko kompletną wiadomość. Gwarantuje to, że ChatHistory odzwierciedla w pełni sformułowane odpowiedzi w celu zapewnienia spójności.

// Define agent
ChatCompletionAgent agent = ...;

ChatHistoryAgentThread agentThread = new();

// Create a user message
var message = ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's also possible to read the messages that were added to the ChatHistoryAgentThread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread

# Define agent
agent = ChatCompletionAgent(...)

# Create a thread object to maintain the conversation state.
# If no thread is provided one will be created and returned with
# the initial response.
thread: ChatHistoryAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread)
{
  # Process streamed response(s)...
  thread = response.thread
}

Funkcja jest obecnie niedostępna w języku Java.

Odpowiedź przesyłana strumieniowo z `OpenAIAssistantAgent`

Podczas wywoływania odpowiedzi przesyłanej strumieniowo z programu OpenAIAssistantAgent, asystent utrzymuje stan konwersacji w formie zdalnego wątku. W razie potrzeby można odczytać komunikaty z wątku zdalnego.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient);

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

Aby utworzyć wątek przy użyciu istniejącego Id, przekaż go do konstruktora OpenAIAssistantAgentThread.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient, "your-existing-thread-id");

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread: AssistantAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Read the messages from the remote thread
async for response in thread.get_messages():
  # Process messages

# Delete the thread
await thread.delete()

Aby utworzyć wątek przy użyciu istniejącego thread_id, przekaż go do konstruktora AssistantAgentThread.

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread = AssistantAgentThread(client=client, thread_id="your-existing-thread-id")

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Delete the thread
await thread.delete()

Funkcja jest obecnie niedostępna w języku Java.

Obsługa komunikatów pośrednich za pomocą strumieniowej odpowiedzi

Natura odpowiedzi strumieniowej umożliwia modelom LLM zwracanie stopniowych części tekstu, co pozwala na szybsze wyświetlanie w interfejsie użytkownika lub konsoli bez konieczności czekania na pełne zakończenie odpowiedzi. Ponadto obiekt wywołujący może chcieć obsłużyć zawartość pośrednią, na przykład wyniki wywołań funkcji. Można to osiągnąć, dostarczając funkcję wywołania zwrotnego podczas wywoływania odpowiedzi strumieniowej. Funkcja wywołania zwrotnego odbiera pełne komunikaty hermetyzowane w pliku ChatMessageContent.

Dokumentacja wywołania zwrotnego dla elementu AzureAIAgent jest dostępna wkrótce.

Skonfigurowanie wywołania zwrotnego on_intermediate_message w programie agent.invoke_stream(...) umożliwia wywołującym odbieranie komunikatów pośrednich generowanych podczas procesu formułowania ostatecznej odpowiedzi agenta.

import asyncio
from typing import Annotated

from semantic_kernel.agents import AzureResponsesAgent
from semantic_kernel.contents import ChatMessageContent, FunctionCallContent, FunctionResultContent
from semantic_kernel.functions import kernel_function


# Define a sample plugin for the sample
class MenuPlugin:
    """A sample Menu Plugin used for the concept sample."""

    @kernel_function(description="Provides a list of specials from the menu.")
    def get_specials(self, menu_item: str) -> Annotated[str, "Returns the specials from the menu."]:
        return """
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        """

    @kernel_function(description="Provides the price of the requested menu item.")
    def get_item_price(
        self, menu_item: Annotated[str, "The name of the menu item."]
    ) -> Annotated[str, "Returns the price of the menu item."]:
        return "$9.99"

# This callback function will be called for each intermediate message,
# which will allow one to handle FunctionCallContent and FunctionResultContent.
# If the callback is not provided, the agent will return the final response
# with no intermediate tool call steps.
async def handle_streaming_intermediate_steps(message: ChatMessageContent) -> None:
    for item in message.items or []:
        if isinstance(item, FunctionResultContent):
            print(f"Function Result:> {item.result} for function: {item.name}")
        elif isinstance(item, FunctionCallContent):
            print(f"Function Call:> {item.name} with arguments: {item.arguments}")
        else:
            print(f"{item}")

# Simulate a conversation with the agent
USER_INPUTS = [
    "Hello",
    "What is the special soup?",
    "What is the special drink?",
    "How much is it?",
    "Thank you",
]


async def main():
    # 1. Create the client using OpenAI resources and configuration
    client, model = AzureResponsesAgent.setup_resources()

    # 2. Create a Semantic Kernel agent for the OpenAI Responses API
    agent = AzureResponsesAgent(
        ai_model_id=model,
        client=client,
        instructions="Answer questions about the menu.",
        name="Host",
        plugins=[MenuPlugin()],
    )

    # 3. Create a thread for the agent
    # If no thread is provided, a new thread will be
    # created and returned with the initial response
    thread = None

    try:
        for user_input in user_inputs:
            print(f"# {AuthorRole.USER}: '{user_input}'")

            first_chunk = True
            async for response in agent.invoke_stream(
                messages=user_input,
                thread=thread,
                on_intermediate_message=handle_streaming_intermediate_steps,
            ):
                thread = response.thread
                if first_chunk:
                    print(f"# {response.name}: ", end="", flush=True)
                    first_chunk = False
                print(response.content, end="", flush=True)
            print()
    finally:
        await thread.delete() if thread else None

if __name__ == "__main__":
    asyncio.run(main())

Poniżej przedstawiono przykładowe dane wyjściowe z procesu wywołania agenta:

Sample Output:

# AuthorRole.USER: 'Hello'
# Host: Hello! How can I assist you with the menu today?
# AuthorRole.USER: 'What is the special soup?'
Function Call:> MenuPlugin-get_specials with arguments: {}
Function Result:>
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        for function: MenuPlugin-get_specials
# Host: The special soup today is Clam Chowder. Would you like to know more about it or hear about other specials?
# AuthorRole.USER: 'What is the special drink?'
# Host: The special drink today is Chai Tea. Would you like more details or are you interested in ordering it?
# AuthorRole.USER: 'How much is that?'
Function Call:> MenuPlugin-get_item_price with arguments: {"menu_item":"Chai Tea"}
Function Result:> $9.99 for function: MenuPlugin-get_item_price
# Host: The special drink, Chai Tea, is $9.99. Would you like to order one or need information on something else?
# AuthorRole.USER: 'Thank you'
# Host: You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day!

Funkcja jest obecnie niedostępna w języku Java.

Dalsze kroki

Używanie szablonów z agentami

Orkiestracja agentów

Last updated on 2025-05-23

Udostępnij przez

Jak przesyłać strumieniowo odpowiedzi agenta

Co to jest odpowiedź przesyłana strumieniowo?

Odwołania do transmisji

Przesyłanie strumieniowe w jądrze semantycznym

Odpowiedź przesyłana strumieniowo z ChatCompletionAgent

Odpowiedź przesyłana strumieniowo z OpenAIAssistantAgent

Obsługa komunikatów pośrednich za pomocą strumieniowej odpowiedzi

Dalsze kroki

Dodatkowe źródła

Odpowiedź przesyłana strumieniowo z `ChatCompletionAgent`

Odpowiedź przesyłana strumieniowo z `OpenAIAssistantAgent`