Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform
Hello Nimesh,
Welcome to Microsoft Q&A .Thank you for reaching out to us.
Thank for sharing the detailed scenario and observations. Based on the behavior and error message, this situation is commonly associated with tool response size limits combined with retrieval and orchestration design pattern
The error below typically indicates that the response returned from the Custom OpenAPI tool exceeds limits allowed by the agent orchestration layer - “tool_user_error: Received message exceeds the maximum configured message size”
This can occur when large payloads (multiple chunks, full search responses, or unnecessary metadata) are returned. Since search responses vary by query, broader queries may intermittently trigger this condition. Returning the full knowledge base in a single response is generally not recommended due to these limits.
Please check if the following helps-
- Stabilizing execution and isolating the behavior A reliable starting point is to validate the Search API independently.
- Test Azure AI Search queries using Postman or cURL
- Inspect:
- Payload size
- Number of returned chunks
- Included metadata and vector fields
- Response latency
- Controlling response size The most effective mitigation is to keep tool responses compact and targeted:
- Limit results: Use $top=3 to $top=5
- Restrict fields:Use $select for essential fields (chunk, title, chunk_id, parent_id)
- Exclude unnecessary data:
- Avoid vector fields such as text_vector
- Remove extra metadata, scores, or debug fields
- Avoid returning raw search responses Instead of returning complete Azure AI Search payloads:
- Retrieve only relevant chunks
- Apply filtering or reranking
- Aggregate or summarize responses
- Return only concise, grounded context
- Supporting complete knowledge retrieval safely When broader knowledge retrieval is required, a single large response can exceed limits. A more reliable approach is:
- Use pagination or staged retrieval
- Retrieve smaller batches across multiple calls
- Aggregate and process results outside the agent
- Return only the final curated response
- Improving retrieval quality while minimizing size To maintain accuracy with smaller payloads:
- Use Hybrid Search (keyword + vector)
- Enable semantic ranking / reranking
- Use focused queries (top‑K retrieval)
- Chunk size: ~500–1000 tokens
- Overlap: ~50–100 tokens
- Introducing a proxy or orchestration layer A scalable architecture is to introduce a middleware layer , the flow would be Agent > OpenAPI Tool > Function / Workflow > Azure AI Search > Aggregation > Final response This approach provides:
- Controlled payload handling
- Pagination and batching support
- Improved logging and diagnostics
- Better error handling
Thus , to summarise
The intermittent failures are most likely caused by large tool responses exceeding orchestration limits, with variability depending on query size. The most reliable long-term approach combines:
- Payload optimization
- Focused retrieval
- Summarization/compression
- Pagination (staged retrieval)
- Workflow or middleware-based orchestration
The following references might be helpful , please check them out
- Connect OpenAPI tools to Microsoft Foundry agents - Microsoft Foundry | Microsoft Learn
- Integrate Azure Functions with Foundry Agents - Microsoft Foundry | Microsoft Learn
- Use Azure Functions with Foundry Agent Service (classic) - Microsoft Foundry (classic) portal | Microsoft Learn
- Tool best practices for Microsoft Foundry Agent Service - Microsoft Foundry | Microsoft Learn
- Run Automated Workflows from Foundry Agents - Azure Logic Apps | Microsoft Learn
Thank you
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the response was helpful. This will be benefitting other community members who face the same issue.