An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Hi Elissa,
Thanks for sharing the details. I can see that the timeout is happening when using GPT‑5‑mini with HIGH reasoning effort in Azure AI Foundry Agents.
This behavior is expected in some cases because higher reasoning effort increases processing time, and in multi-step agent workflows the request may exceed the allowed execution time. [ai.azure.com]
Below are some practical steps and examples to help you troubleshoot:
1. Understand why timeout is happening
- HIGH reasoning mode takes more time because the model is doing deeper analysis
- Multi-agent workflows increase total execution time
- If the response takes too long, it can hit timeout limits
Also, synchronous agent or tool calls must complete within a limited time window, otherwise they fail with timeout.
2. Common timeout causes (with examples)
Here are some common real-world causes:
Long reasoning tasks Example: using HIGH reasoning for complex analysis or multi-step logic
Large input payload Example: sending big JSON with many fields or large context
Large output generation Example: generating long responses or detailed reports
Multi-agent orchestration delays Example: step 3 agent waits on step 2 output which itself is slow
Slow external tools or APIs Example: agent calling search API or database that takes time
Waiting for full response instead of partial output Example: backend waits for complete response before returning
High latency or region load Example: same request works sometimes but times out during peak load
Timeout can also happen due to network latency or service overload in backend calls.
3. Enable streaming response (very important)
Right now you are waiting for full completion. This increases timeout risk.
What streaming does:
- Returns output in chunks as soon as it is generated
- User sees partial response immediately
- Reduces perceived timeout issues
How to enable (general approach):
- If using SDK
- Enable streaming option in response call
- Example concept stream = true
- If using API
- Use streaming-supported endpoint or parameter
- If using UI or Playground
- Turn on streaming or real-time output option
- Enable streaming option in response call
Streaming is recommended for long responses because it improves responsiveness while the model is still processing. (Streaming responses are supported for real-time interaction in Foundry agents.) [drafts.cod...thme.cloud]
4. Reduce reasoning effort strategically
Instead of using HIGH everywhere:
- Use MED or LOW for most steps
- Use HIGH only where deep reasoning is needed
- Combine models if needed (fast + reasoning mix)
This helps balance quality and performance.
5. Break large tasks into smaller steps
Avoid doing everything in one request.
Example:
- Step 1: preprocess input
- Step 2: generate intermediate output
- Step 3: final reasoning
This keeps each step within execution limits.
6. Optimize token usage
Latency depends on tokens processed.
- Reduce unnecessary input data
- Avoid repeating context
- Limit response length
More tokens mean more processing time.
7. Check slow steps in workflow
If your agent uses tools or sub-agents:
- Identify which step is slow
- Check logs or traces
- Optimize that specific part
Often timeout is caused by one slow dependency.
8. Check region and performance factors
- Try another region if possible
- Monitor if issue is consistent or intermittent
- Check if happens in both portal and backend
In short: This timeout happens mainly due to:
- high reasoning effort
- long-running workflows
- large input or output
- slow tool execution
Best approach:
- use streaming responses
- reduce reasoning where possible
- break workflow into smaller steps
I Hope this helps. Do let me know if you have any further queries.
Thankyou!