How to configure Thread Truncation / Compaction Strategy for Azure AI Foundry Agents?

hok yan tam 45

Environment Setup:

Platform: New Azure AI Foundry (portal)
Architecture: RAG-based Agent (Foundry IQ / Vector Knowledge Base)
Deployment Target: Microsoft Teams (via the built-in "Publish to Teams" connector)

The Problem: I am building a RAG agent in Azure AI new Foundry IQ. Because conversations are durable and store all items, the conversation_id is persisting both the message items and the massive tool_output items (the raw chunks retrieved from the knowledge base) across every turn.

In my testing, this unbounded history accumulates rapidly. After just a few questions, the accumulated context window balloons, leading to a higher and higher token usage.

Currently in the Foundry Playground, I can manually reset the chat to clear the context, but this isn't a setting for me to change the memory setting. I assume we need to get this setting done before deploying to Teams or Copilot.

Any guidance on how to properly implement a "sliding window" or other strategies instead of putting all the stuff as input for the following questions would be greatly appreciated!

0 comments

Answer accepted by question author

Anshika Varshney 12,365 Microsoft External Staff Moderator

Hey hok yan tam,

You’re right to look for a way to control thread size, but currently Azure AI Foundry Agents (service-based threads) work a bit differently.

At the moment, there is no built-in setting in the Foundry portal to automatically truncate, compact, or limit thread history.

The service is designed to persist the entire conversation history for each thread, and it does not provide a native option to automatically trim older messages.

What you can do instead

Control context at request time (recommended)

Even though the full thread is stored, you can control what is actually sent to the model.

When you send a new request, you can:
- only include the latest N messages
- or manually build a smaller message list

This is the most common approach and gives you full control over tokens and context size.

Implement your own truncation or summarization logic

Since the service does not auto-truncate, you can:

Keep only last few turns in your app logic
Or summarize older messages and keep a short summary instead

This helps reduce token usage and keeps conversations efficient.

Manage threads manually

For long-running conversations:

You can create a new thread periodically
Or delete old threads using the API if no longer needed

This helps control storage and avoids very large histories.

Important note

The Foundry portal thread view always shows full history
Any truncation you do in your code will not change what is stored, only what is sent to the model

Summary

No built-in truncation or compaction setting in Foundry Agents today
You must handle it at application level
Best practice is to:
- send only recent messages
- or use summarization for older context

I Hope this helps. Do let me know if you have any further queries.

Thankyou!

Anshika Varshney 12,365 Reputation points Microsoft External Staff Moderator

2026-05-18T07:15:20.03+00:00

Hello hok yan,

Please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.

If these steps resolve your issue, please mark this answer as Accepted so others can easily find the solution.

Thankyou!

Answer accepted by question author

kagiyama yutaka 3,430

I think Foundry just doesn’t give us any option for truncation or a sliding‑window right now. we just trim the history on our side before the next turn.

hok yan tam 45 Reputation points

2026-05-11T12:37:06.7533333+00:00

But there will be a large load of input and tokens used(especially for RAG) when users continue the conversation. For some low reasoning effort agent model, they might even lose attention from the system prompt due to the flooding input and do not follow it.
kagiyama yutaka 3,430 Reputation points

2026-05-11T12:47:14.8033333+00:00

I think Foundry keeps the whole thread, so the only safe fix is trimming the history on ur side — most people keep just the system prompt, the last user msg, and the tiny state the agent really needs, so the model doesn’t get buried under old RAG chunks.
hok yan tam 45 Reputation points

2026-05-11T12:50:27.9466667+00:00

Yup, I agree. So currently nothing can be done from my side?
kagiyama yutaka 3,430 Reputation points

2026-05-11T13:06:07.09+00:00

I think Foundry has no truncation, so you just trim history yourself — that’s it.

1 additional answer

AI answer

2026-05-11T01:14:37.97+00:00
Azure AI Foundry agents (Foundry IQ) use service-managed conversation state. For these agents, context window management and truncation are handled by the service, and custom compaction strategies cannot be configured.

Compaction strategies such as sliding windows, truncation, tool-result compaction, or summarization apply only to agents that manage their own conversation history in memory. Examples of agents where compaction is relevant include those built with the agent framework that maintain an in-memory message list and send that full history to the model on each call.

For service-managed agents, including:

Foundry Agents

Responses API with store enabled (default)

Copilot Studio agents

configuring a compaction strategy has no effect. The service already manages context, including truncation, to stay within token limits.

In this architecture (Foundry IQ RAG agent published to Teams), conversation history and tool outputs are stored and managed by the service. There is no supported way to inject a custom sliding-window or compaction strategy into that pipeline. Token and context management are handled automatically by the platform.

If custom compaction behavior is required (for example, sliding window, tool-result summarization, or truncation), the supported approach is to build an in-memory agent using the agent framework and apply compaction strategies there, rather than relying on service-managed context.

References:

Compaction (programming-language-csharp)

Compaction (programming-language-python)

Azure OpenAI Assistants API (Preview) (classic)

Threads in Azure SRE Agent
AI-generated content may be incorrect. Read our transparency notes for more information.
Was this answer helpful?

Share via

How to configure Thread Truncation / Compaction Strategy for Azure AI Foundry Agents?

1 additional answer

Your answer