Edit

Share via


Compaction

As conversations grow, the token count of the chat history can exceed model context windows or drive up costs. Compaction strategies reduce the size of conversation history while preserving important context, so agents can continue functioning over long-running interactions.

Important

The compaction framework is currently experimental. To use it, you will need to add #pragma warning disable MAAI001.

Important

The compaction framework is currently experimental in Python. Import strategies from agent_framework._compaction.

Why compaction matters

Every call to an LLM includes the full conversation history. Without compaction:

  • Token limits — Conversations eventually exceed the model's context window, causing errors.
  • Cost — Larger prompts consume more tokens, increasing API costs.
  • Latency — More input tokens means slower response times.

Compaction solves these problems by selectively removing, collapsing, or summarizing older portions of the conversation.

Core concepts

Applicability: In-memory history agents only

Compaction applies only to agents that manage their own conversation history in memory. Agents that rely on service-managed context or conversation state do not benefit from compaction because the service already handles context management. Examples of service-managed agents include:

  • Foundry Agents — context is managed server-side by the Azure AI Foundry service.
  • Responses API with store enabled (the default) — conversation state is stored and managed by the OpenAI service.
  • Copilot Studio agents — conversation context is maintained by the Copilot Studio service.

For these agent types, configuring a compaction strategy has no effect. Compaction is only relevant when the agent maintains its own in-memory message list and passes the full history to the model on each call.

Compaction operates on a MessageIndex — a structured view of the flat message list that groups messages into atomic units called MessageGroup instances. Each group tracks its message count, byte count, and estimated token count.

Message groups

A MessageGroup represents logically related messages that must be kept or removed together. For example, an assistant message containing tool calls and its corresponding tool result messages form an atomic group — removing one without the other would cause LLM API errors.

Each group has a MessageGroupKind:

Kind Description
System One or more system messages. Always preserved during compaction.
User A single user message that starts a new turn.
AssistantText A plain assistant text response (no tool calls).
ToolCall An assistant message with tool calls and the corresponding tool result messages, treated as an atomic unit.
Summary A condensed message produced by summarization compaction.

Triggers

A CompactionTrigger is a delegate that evaluates whether compaction should proceed based on current MessageIndex metrics:

public delegate bool CompactionTrigger(MessageIndex index);

The CompactionTriggers class provides common factory methods:

Trigger Fires when
CompactionTriggers.Always Every time (unconditionally).
CompactionTriggers.Never Never (disables compaction).
CompactionTriggers.TokensExceed(maxTokens) Included token count exceeds the threshold.
CompactionTriggers.MessagesExceed(maxMessages) Included message count exceeds the threshold.
CompactionTriggers.TurnsExceed(maxTurns) Included user turn count exceeds the threshold.
CompactionTriggers.GroupsExceed(maxGroups) Included group count exceeds the threshold.
CompactionTriggers.HasToolCalls() At least one non-excluded tool call group exists.

Combine triggers with CompactionTriggers.All(...) (logical AND) or CompactionTriggers.Any(...) (logical OR):

// Compact only when there are tool calls AND tokens exceed 2000
CompactionTrigger trigger = CompactionTriggers.All(
    CompactionTriggers.HasToolCalls(),
    CompactionTriggers.TokensExceed(2000));

Trigger vs. target

Every strategy has two predicates:

  • Trigger — Controls when compaction begins. If the trigger returns false, the strategy is skipped entirely.
  • Target — Controls when compaction stops. Strategies incrementally exclude groups and re-evaluate the target after each step, stopping as soon as the target returns true.

When no target is specified, it defaults to the inverse of the trigger — compaction stops as soon as the trigger condition would no longer fire.

Compaction operates on a flat list of Message objects. Messages are annotated with lightweight group metadata, and strategies mutate those annotations in place to mark groups as excluded before the message list is projected to the model.

Message groups

Messages are grouped into atomic units. Each group is assigned a GroupKind:

Kind Description
system System messages. Always preserved during compaction.
user A single user message.
assistant_text A plain assistant text response (no function calls).
tool_call An assistant message with function calls plus the corresponding tool result messages, treated as an atomic unit.

Compaction strategies

A CompactionStrategy is a protocol — any async callable that accepts a list[Message] and mutates it in place, returning True when it changed anything:

class CompactionStrategy(Protocol):
    async def __call__(self, messages: list[Message]) -> bool: ...

Tokenizer

Token-aware strategies accept a TokenizerProtocol implementation. The built-in CharacterEstimatorTokenizer uses a 4-character-per-token heuristic:

from agent_framework._compaction import CharacterEstimatorTokenizer

tokenizer = CharacterEstimatorTokenizer()

Pass a custom tokenizer when you need accurate token counts for a specific model's encoding.

Compaction strategies

All strategies inherit from the abstract CompactionStrategy base class. Each strategy preserves system messages and respects a MinimumPreserved floor that protects the most-recent non-system groups from removal.

Compaction strategies are imported from agent_framework._compaction.

TruncationCompactionStrategy

TruncationStrategy

The most straightforward approach: removes the oldest non-system message groups until the target condition is met.

  • Respects atomic group boundaries (tool call and result messages are removed together).
  • Best for hard token-budget backstops.
  • MinimumPreserved defaults to 32.
// Drop oldest groups when tokens exceed 32K, keeping at least 10 recent groups
TruncationCompactionStrategy truncation = new(
    trigger: CompactionTriggers.TokensExceed(0x8000),
    minimumPreserved: 10);
  • When a tokenizer is provided, the metric is token count; otherwise it is included message count.
  • preserve_system defaults to True.
from agent_framework._compaction import CharacterEstimatorTokenizer, TruncationStrategy

# Exclude oldest groups when tokens exceed 32 000, trimming to 16 000
truncation = TruncationStrategy(
    max_n=32_000,
    compact_to=16_000,
    tokenizer=CharacterEstimatorTokenizer(),
)

SlidingWindowCompactionStrategy

SlidingWindowStrategy

Removes older conversation content to keep only the most recent window of exchanges, respecting logical conversation units rather than arbitrary message counts. System messages are preserved throughout.

  • Best for bounding conversation length predictably.

Removes the oldest user turns and their associated response groups, operating on logical turn boundaries rather than individual groups.

  • A turn starts with a user message and includes all subsequent assistant and tool-call groups until the next user message.
  • MinimumPreserved defaults to 1 (preserves at least the most recent non-system group).
// Keep only the last 4 user turns
SlidingWindowCompactionStrategy slidingWindow = new(
    trigger: CompactionTriggers.TurnsExceed(4));

Keeps only the most recent keep_last_groups non-system groups, excluding everything older.

  • preserve_system defaults to True.
from agent_framework._compaction import SlidingWindowStrategy

# Keep only the last 20 non-system groups
sliding_window = SlidingWindowStrategy(keep_last_groups=20)

ToolResultCompactionStrategy

Collapses older tool-call groups into compact summary messages, preserving a readable trace without the full message overhead.

  • Does not touch user messages or plain assistant responses.
  • Best as a first-pass strategy to reclaim space from verbose tool results.
  • Replaces multi-message tool call groups (assistant call + tool results) with a short summary like [Tool calls: get_weather, search_docs].
  • MinimumPreserved defaults to 2, ensuring the current turn's tool interactions remain visible.
// Collapse old tool results when tokens exceed 512
ToolResultCompactionStrategy toolCompaction = new(
    trigger: CompactionTriggers.TokensExceed(0x200));
  • Collapses into compact summary messages such as [Tool results: get_weather: sunny, 18°C].
  • The most recent keep_last_tool_call_groups tool-call groups are left untouched.
from agent_framework._compaction import ToolResultCompactionStrategy

# Collapse all but the newest tool-call group
tool_result = ToolResultCompactionStrategy(keep_last_tool_call_groups=1)

SummarizationCompactionStrategy

SummarizationStrategy

Uses an LLM to summarize older portions of the conversation, replacing them with a single summary message.

  • A default prompt preserves key facts, decisions, user preferences, and tool call outcomes.
  • Requires a separate LLM client for summarization — a smaller, faster model is recommended.
  • Best for preserving conversational context while significantly reducing token count.
  • You can provide a custom summarization prompt.
  • Protects system messages and the most recent MinimumPreserved non-system groups (default: 4).
  • Sends the older messages to a separate IChatClient with a summarization prompt, then inserts the summary as a MessageGroupKind.Summary group.
// Summarize older messages when tokens exceed 1280, keeping the last 4 groups
SummarizationCompactionStrategy summarization = new(
    chatClient: summarizerChatClient,
    trigger: CompactionTriggers.TokensExceed(0x500),
    minimumPreserved: 4);

You can provide a custom summarization prompt:

SummarizationCompactionStrategy summarization = new(
    chatClient: summarizerChatClient,
    trigger: CompactionTriggers.TokensExceed(0x500),
    summarizationPrompt: "Summarize the key decisions and user preferences only.");
  • Triggers when included non-system message count exceeds target_count + threshold.
  • Retains the newest target_count messages; summarizes everything older.
  • Requires a SupportsChatGetResponse client.
from agent_framework._compaction import SummarizationStrategy

# Summarize when non-system message count exceeds 6, retaining the 4 newest
summarization = SummarizationStrategy(
    client=summarizer_client,
    target_count=4,
    threshold=2,
)

Provide a custom summarization prompt:

summarization = SummarizationStrategy(
    client=summarizer_client,
    target_count=4,
    prompt="Summarize the key decisions and user preferences only.",
)

PipelineCompactionStrategy

Composes multiple strategies into a sequential pipeline. Each strategy operates on the result of the previous one, enabling layered compaction from gentle to aggressive.

  • The pipeline's own trigger is CompactionTriggers.Always — each child strategy evaluates its own trigger independently.
  • Strategies execute in order, so put the gentlest strategies first.
PipelineCompactionStrategy pipeline = new(
    new ToolResultCompactionStrategy(CompactionTriggers.TokensExceed(0x200)),
    new SummarizationCompactionStrategy(summarizerChatClient, CompactionTriggers.TokensExceed(0x500)),
    new SlidingWindowCompactionStrategy(CompactionTriggers.TurnsExceed(4)),
    new TruncationCompactionStrategy(CompactionTriggers.TokensExceed(0x8000)));

This pipeline:

  1. Collapses old tool results (gentle).
  2. Summarizes older conversation spans (moderate).
  3. Keeps only the last 4 user turns (aggressive).
  4. Drops oldest groups if still over budget (emergency backstop).

SelectiveToolCallCompactionStrategy

Fully excludes older tool-call groups, keeping only the last keep_last_tool_call_groups.

  • Does not touch user or plain assistant messages.
  • Best when tool chatter dominates token usage and the full tool history is not needed.
from agent_framework._compaction import SelectiveToolCallCompactionStrategy

# Keep only the most recent tool-call group
selective_tool = SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=1)

TokenBudgetComposedStrategy

Composes multiple strategies into a sequential pipeline driven by a token budget. Each child strategy runs in order, stopping early once the budget is satisfied. A built-in fallback excludes the oldest groups if the strategies alone cannot reach the target.

  • Strategies execute in order; place the gentlest strategies first.
  • early_stop=True (the default) stops as soon as the token budget is satisfied.
from agent_framework._compaction import (
    CharacterEstimatorTokenizer,
    SelectiveToolCallCompactionStrategy,
    SlidingWindowStrategy,
    SummarizationStrategy,
    TokenBudgetComposedStrategy,
    ToolResultCompactionStrategy,
)

tokenizer = CharacterEstimatorTokenizer()

pipeline = TokenBudgetComposedStrategy(
    token_budget=16_000,
    tokenizer=tokenizer,
    strategies=[
        ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
        SummarizationStrategy(client=summarizer_client, target_count=4, threshold=2),
        SlidingWindowStrategy(keep_last_groups=20),
    ],
)

This pipeline:

  1. Collapses old tool results (gentle).
  2. Summarizes older conversation spans (moderate).
  3. Keeps only the last 20 groups (aggressive).
  4. Falls back to oldest-first exclusion if still over budget (emergency backstop).

Using compaction with an agent

Wrap a compaction strategy in a CompactionProvider and register it as an AIContextProvider. Pass either a single strategy or a PipelineCompactionStrategy to the constructor.

Registering with the builder API

Register the provider on the ChatClientBuilder using UseAIContextProviders. The provider runs inside the tool-calling loop, compacting messages before each LLM call.

IChatClient agentChatClient = openAIClient.GetChatClient(deploymentName).AsIChatClient();
IChatClient summarizerChatClient = openAIClient.GetChatClient(deploymentName).AsIChatClient();

PipelineCompactionStrategy compactionPipeline =
    new(
        new ToolResultCompactionStrategy(CompactionTriggers.TokensExceed(0x200)),
        new SummarizationCompactionStrategy(summarizerChatClient, CompactionTriggers.TokensExceed(0x500)),
        new SlidingWindowCompactionStrategy(CompactionTriggers.TurnsExceed(4)),
        new TruncationCompactionStrategy(CompactionTriggers.TokensExceed(0x8000)));

AIAgent agent =
    agentChatClient
        .AsBuilder()
        .UseAIContextProviders(new CompactionProvider(compactionPipeline))
        .BuildAIAgent(
            new ChatClientAgentOptions
            {
                Name = "ShoppingAssistant",
                ChatOptions = new()
                {
                    Instructions = "You are a helpful shopping assistant.",
                    Tools = [AIFunctionFactory.Create(LookupPrice)],
                },
            });

AgentSession session = await agent.CreateSessionAsync();
Console.WriteLine(await agent.RunAsync("What's the price of a laptop?", session));

Tip

Use a smaller, cheaper model (such as gpt-4o-mini) for the summarization chat client to reduce costs while maintaining summary quality.

If only one strategy is needed, pass it directly to CompactionProvider without wrapping it in a PipelineCompactionStrategy:

agentChatClient
    .AsBuilder()
    .UseAIContextProviders(new CompactionProvider(
        new SlidingWindowCompactionStrategy(CompactionTriggers.TurnsExceed(20))))
    .BuildAIAgent(...);

Registering through ChatClientAgentOptions

The provider can also be specified directly on ChatClientAgentOptions.AIContextProviders:

AIAgent agent = agentChatClient
    .AsBuilder()
    .BuildAIAgent(new ChatClientAgentOptions
    {
        AIContextProviders = [new CompactionProvider(compactionPipeline)]
    });

Note

When registered through ChatClientAgentOptions, the CompactionProvider is not engaged during the tool-calling loop. Agent-level context providers run before chat history is stored, so any synthetic summary messages produced by CompactionProvider can become part of the persisted history when using ChatHistoryProvider. To compact only the in-flight request context while preserving the original stored history, register the provider on the ChatClientBuilder via UseAIContextProviders(...) instead.

Ad-hoc compaction

CompactionProvider.CompactAsync applies a strategy to an arbitrary message list without an active agent session:

IEnumerable<ChatMessage> compacted = await CompactionProvider.CompactAsync(
    new TruncationCompactionStrategy(CompactionTriggers.TokensExceed(8000)),
    existingMessages);

CompactionProvider is a context provider that applies compaction strategies before and after each agent run. Add it alongside a history provider in the agent's context_providers list.

  • before_strategy — runs before the model call, compacting messages already loaded into the context.
  • after_strategy — runs after the model call, compacting the messages stored by the history provider so the next turn starts smaller.
  • history_source_id — the source_id of the history provider whose stored messages after_strategy should compact (defaults to "in_memory").

Registering with an agent

from agent_framework import Agent, CompactionProvider, InMemoryHistoryProvider
from agent_framework._compaction import (
    CharacterEstimatorTokenizer,
    SlidingWindowStrategy,
    SummarizationStrategy,
    TokenBudgetComposedStrategy,
    ToolResultCompactionStrategy,
)

tokenizer = CharacterEstimatorTokenizer()

pipeline = TokenBudgetComposedStrategy(
    token_budget=16_000,
    tokenizer=tokenizer,
    strategies=[
        ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
        SummarizationStrategy(client=summarizer_client, target_count=4, threshold=2),
        SlidingWindowStrategy(keep_last_groups=20),
    ],
)

history = InMemoryHistoryProvider()
compaction = CompactionProvider(
    before_strategy=pipeline,
    history_source_id=history.source_id,
)

agent = Agent(
    client=client,
    name="ShoppingAssistant",
    instructions="You are a helpful shopping assistant.",
    context_providers=[history, compaction],
)

session = agent.create_session()
print(await agent.run("What's the price of a laptop?", session=session))

Tip

Use a smaller, cheaper model (such as gpt-4o-mini) for the summarization client to reduce costs while maintaining summary quality.

If only one strategy is needed, pass it directly as before_strategy:

compaction = CompactionProvider(
    before_strategy=SlidingWindowStrategy(keep_last_groups=20),
    history_source_id=history.source_id,
)

Compacting persisted history after each run

Use after_strategy to compact the messages stored by the history provider so that future turns begin with a reduced context:

compaction = CompactionProvider(
    before_strategy=SlidingWindowStrategy(keep_last_groups=20),
    after_strategy=ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
    history_source_id=history.source_id,
)

Ad-hoc compaction

apply_compaction applies a strategy to an arbitrary message list outside an active agent session:

from agent_framework._compaction import apply_compaction, TruncationStrategy, CharacterEstimatorTokenizer

tokenizer = CharacterEstimatorTokenizer()

compacted = await apply_compaction(
    messages,
    strategy=TruncationStrategy(
        max_n=8_000,
        compact_to=4_000,
        tokenizer=tokenizer,
    ),
    tokenizer=tokenizer,
)

Choosing a strategy

Strategy Aggressiveness Preserves context Requires LLM Best for
ToolResultCompactionStrategy Low High — only collapses tool results No Reclaiming space from verbose tool output
SummarizationCompactionStrategy Medium Medium — replaces history with a summary Yes Long conversations where context matters
SlidingWindowCompactionStrategy High Low — drops entire turns No Hard turn-count limits
TruncationCompactionStrategy High Low — drops oldest groups No Emergency token-budget backstops
PipelineCompactionStrategy Configurable Depends on child strategies Depends Layered compaction with multiple fallbacks
Strategy Aggressiveness Preserves context Requires LLM Best for
ToolResultCompactionStrategy Low High — collapses tool results into summary messages No Reclaiming space from verbose tool output
SelectiveToolCallCompactionStrategy Low–Medium Medium — fully excludes old tool-call groups No Removing tool history when results are no longer needed
SummarizationStrategy Medium Medium — replaces history with a summary Yes Long conversations where context matters
SlidingWindowStrategy High Low — drops oldest groups No Hard group-count limits
TruncationStrategy High Low — drops oldest groups No Emergency message- or token-budget backstops
TokenBudgetComposedStrategy Configurable Depends on child strategies Depends Layered compaction with a token-budget goal and multiple fallbacks

Next steps