Knowledge index in Microsoft Foundry that lets AI agents retrieve grounded information from organization’s data
Hey hok yan tam,
You’re right to look for a way to control thread size, but currently Azure AI Foundry Agents (service-based threads) work a bit differently.
At the moment, there is no built-in setting in the Foundry portal to automatically truncate, compact, or limit thread history.
The service is designed to persist the entire conversation history for each thread, and it does not provide a native option to automatically trim older messages.
What you can do instead
- Control context at request time (recommended)
Even though the full thread is stored, you can control what is actually sent to the model.
- When you send a new request, you can:
- only include the latest N messages
- or manually build a smaller message list
This is the most common approach and gives you full control over tokens and context size.
- Implement your own truncation or summarization logic
Since the service does not auto-truncate, you can:
- Keep only last few turns in your app logic
- Or summarize older messages and keep a short summary instead
This helps reduce token usage and keeps conversations efficient.
- Manage threads manually
For long-running conversations:
- You can create a new thread periodically
- Or delete old threads using the API if no longer needed
This helps control storage and avoids very large histories.
Important note
- The Foundry portal thread view always shows full history
- Any truncation you do in your code will not change what is stored, only what is sent to the model
Summary
- No built-in truncation or compaction setting in Foundry Agents today
- You must handle it at application level
- Best practice is to:
- send only recent messages
- or use summarization for older context
I Hope this helps. Do let me know if you have any further queries.
Thankyou!