Wie man Agentenantworten überträgt

2025-05-23

Was ist eine gestreamte Antwort?

Eine gestreamte Antwort liefert den Nachrichteninhalt in kleinen, inkrementellen Blöcken. Mit diesem Ansatz wird die Benutzererfahrung verbessert, indem sie ihnen ermöglichen, die Nachricht beim Entfalten anzuzeigen und mit ihr zu interagieren, anstatt zu warten, bis die gesamte Antwort geladen ist. Benutzer können sofort mit der Verarbeitung von Informationen beginnen und das Gefühl der Reaktionsfähigkeit und Interaktivität verbessern. Dadurch werden Verzögerungen minimiert und die Benutzer während des gesamten Kommunikationsprozesses stärker eingebunden.

Streamingverweise

Streaming im semantischen Kernel

KI-Dienste , die Streaming im semantischen Kernel unterstützen, verwenden unterschiedliche Inhaltstypen im Vergleich zu denen, die für vollgeformte Nachrichten verwendet werden. Diese Inhaltstypen sind speziell darauf ausgelegt, die inkrementelle Art von Streamingdaten zu behandeln. Dieselben Inhaltstypen werden auch im Agent Framework für ähnliche Zwecke verwendet. Dies sorgt für Konsistenz und Effizienz in beiden Systemen beim Umgang mit Streaminginformationen.

Tipp

API-Referenz:

Tipp

API-Referenz:

Das Feature ist derzeit in Java nicht verfügbar.

Gestreamte Antwort von `ChatCompletionAgent`

Beim Aufrufen einer gestreamten Antwort von ChatCompletionAgent wird das ChatHistory im AgentThread aktualisiert, nachdem die vollständige Antwort empfangen wurde. Obwohl die Antwort inkrementell gestreamt wird, zeichnet der Verlauf nur die vollständige Nachricht auf. Dadurch wird sichergestellt, dass die ChatHistory vollständig ausgeformte Antworten für Konsistenz darstellen.

// Define agent
ChatCompletionAgent agent = ...;

ChatHistoryAgentThread agentThread = new();

// Create a user message
var message = ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's also possible to read the messages that were added to the ChatHistoryAgentThread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread

# Define agent
agent = ChatCompletionAgent(...)

# Create a thread object to maintain the conversation state.
# If no thread is provided one will be created and returned with
# the initial response.
thread: ChatHistoryAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread)
{
  # Process streamed response(s)...
  thread = response.thread
}

Das Feature ist derzeit in Java nicht verfügbar.

Gestreamte Antwort von `OpenAIAssistantAgent`

Beim Aufrufen einer gestreamten Antwort von einem OpenAIAssistantAgent-Assistenten wird der Gesprächsstatus als Remotethread verwaltet. Falls erforderlich, können die Nachrichten aus dem Remotethread gelesen werden.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient);

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

Um einen Thread mit einem vorhandenen Id zu erstellen, übergeben Sie ihn an den Konstruktor von OpenAIAssistantAgentThread:

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient, "your-existing-thread-id");

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread: AssistantAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Read the messages from the remote thread
async for response in thread.get_messages():
  # Process messages

# Delete the thread
await thread.delete()

Um einen Thread mit einem vorhandenen thread_id zu erstellen, übergeben Sie ihn an den Konstruktor von AssistantAgentThread:

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread = AssistantAgentThread(client=client, thread_id="your-existing-thread-id")

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Delete the thread
await thread.delete()

Das Feature ist derzeit in Java nicht verfügbar.

Bearbeitung von Nachrichten im Zwischenstadium mit einer Streamingantwort

Die Natur von Streamingantworten ermöglicht LLM-Modellen, schrittweise Textblöcke zurückzugeben, sodass in einer Benutzeroberfläche oder Konsole schneller gerendert werden kann, ohne auf den Abschluss der gesamten Antwort warten zu müssen. Darüber hinaus möchte ein Aufrufer möglicherweise Zwischeninhalte verarbeiten, z. B. Ergebnisse aus Funktionsaufrufen. Dies kann erreicht werden, indem eine Rückruffunktion beim Aufrufen der Streamingantwort bereitgestellt wird. Die Rückruffunktion empfängt vollständige Nachrichten, die in einer ChatMessageContent Kapselung enthalten sind.

Die Rückrufdokumentation für das AzureAIAgent wird in Kürze verfügbar sein.

Das Konfigurieren des on_intermediate_message Rückrufs innerhalb agent.invoke_stream(...) ermöglicht es dem Anrufer, Zwischennachrichten zu empfangen, die während des Prozesses der Erstellung der endgültigen Antwort des Agents generiert wurden.

import asyncio
from typing import Annotated

from semantic_kernel.agents import AzureResponsesAgent
from semantic_kernel.contents import ChatMessageContent, FunctionCallContent, FunctionResultContent
from semantic_kernel.functions import kernel_function


# Define a sample plugin for the sample
class MenuPlugin:
    """A sample Menu Plugin used for the concept sample."""

    @kernel_function(description="Provides a list of specials from the menu.")
    def get_specials(self, menu_item: str) -> Annotated[str, "Returns the specials from the menu."]:
        return """
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        """

    @kernel_function(description="Provides the price of the requested menu item.")
    def get_item_price(
        self, menu_item: Annotated[str, "The name of the menu item."]
    ) -> Annotated[str, "Returns the price of the menu item."]:
        return "$9.99"

# This callback function will be called for each intermediate message,
# which will allow one to handle FunctionCallContent and FunctionResultContent.
# If the callback is not provided, the agent will return the final response
# with no intermediate tool call steps.
async def handle_streaming_intermediate_steps(message: ChatMessageContent) -> None:
    for item in message.items or []:
        if isinstance(item, FunctionResultContent):
            print(f"Function Result:> {item.result} for function: {item.name}")
        elif isinstance(item, FunctionCallContent):
            print(f"Function Call:> {item.name} with arguments: {item.arguments}")
        else:
            print(f"{item}")

# Simulate a conversation with the agent
USER_INPUTS = [
    "Hello",
    "What is the special soup?",
    "What is the special drink?",
    "How much is it?",
    "Thank you",
]


async def main():
    # 1. Create the client using OpenAI resources and configuration
    client, model = AzureResponsesAgent.setup_resources()

    # 2. Create a Semantic Kernel agent for the OpenAI Responses API
    agent = AzureResponsesAgent(
        ai_model_id=model,
        client=client,
        instructions="Answer questions about the menu.",
        name="Host",
        plugins=[MenuPlugin()],
    )

    # 3. Create a thread for the agent
    # If no thread is provided, a new thread will be
    # created and returned with the initial response
    thread = None

    try:
        for user_input in user_inputs:
            print(f"# {AuthorRole.USER}: '{user_input}'")

            first_chunk = True
            async for response in agent.invoke_stream(
                messages=user_input,
                thread=thread,
                on_intermediate_message=handle_streaming_intermediate_steps,
            ):
                thread = response.thread
                if first_chunk:
                    print(f"# {response.name}: ", end="", flush=True)
                    first_chunk = False
                print(response.content, end="", flush=True)
            print()
    finally:
        await thread.delete() if thread else None

if __name__ == "__main__":
    asyncio.run(main())

Folgendes zeigt die Beispielausgabe des Agentenaufrufprozesses:

Sample Output:

# AuthorRole.USER: 'Hello'
# Host: Hello! How can I assist you with the menu today?
# AuthorRole.USER: 'What is the special soup?'
Function Call:> MenuPlugin-get_specials with arguments: {}
Function Result:>
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        for function: MenuPlugin-get_specials
# Host: The special soup today is Clam Chowder. Would you like to know more about it or hear about other specials?
# AuthorRole.USER: 'What is the special drink?'
# Host: The special drink today is Chai Tea. Would you like more details or are you interested in ordering it?
# AuthorRole.USER: 'How much is that?'
Function Call:> MenuPlugin-get_item_price with arguments: {"menu_item":"Chai Tea"}
Function Result:> $9.99 for function: MenuPlugin-get_item_price
# Host: The special drink, Chai Tea, is $9.99. Would you like to order one or need information on something else?
# AuthorRole.USER: 'Thank you'
# Host: You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day!

Das Feature ist derzeit in Java nicht verfügbar.

Nächste Schritte

Verwenden von Vorlagen mit Agents

Agent-Orchestrierung

Freigeben über

Wie man Agentenantworten überträgt

Was ist eine gestreamte Antwort?

Streamingverweise

Streaming im semantischen Kernel

Gestreamte Antwort von ChatCompletionAgent

Gestreamte Antwort von OpenAIAssistantAgent

Bearbeitung von Nachrichten im Zwischenstadium mit einer Streamingantwort

Nächste Schritte

Zusätzliche Ressourcen

Gestreamte Antwort von `ChatCompletionAgent`

Gestreamte Antwort von `OpenAIAssistantAgent`