Share via

GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance

Elissa Castellanos (NEXTANT LLC) 20 Reputation points Microsoft External Staff
2026-05-08T00:27:18.7833333+00:00

We are currently using GPT-5-mini inside Azure AI Foundry Agents for an enterprise agent workflow that is in development/UAT, with a planned production launch in the next 1–2 months.

Previously, we were running the model using HIGH reasoning effort and were getting significantly better output quality and overall performance. However, we started experiencing severe latency issues that eventually caused the agent process to time out consistently.

We opened a Microsoft support ticket and, after discussing the issue with a Microsoft engineer, were advised to reduce the reasoning effort from HIGH to MED. This change resolved the timeout issue, but introduced two new problems:

  • Lower output quality
  • Slower response times (approximately 1.5 minutes slower in our scenario)

Our goal is to understand whether there is any way to continue using GPT-5-mini with HIGH reasoning effort without hitting these timeout limitations.

Environment details:

  • Azure AI Foundry Agents
  • GPT-5-mini
  • East US 2
  • Python-based orchestrated container backend
  • Sequential multi-agent workflow
  • The failing agent is step 3 of the orchestration
  • Single-request testing (not concurrent load)
  • Using the latest Azure SDK/API version
  • Waiting for full completion (not streaming)
  • Retry logic configured with exponential backoff (8 retries starting at 5 seconds)

Additional observations:

  • The issue reproduces consistently
  • The timeout occurs both from our backend and directly within the Foundry portal
  • The timeout occurs after approximately 10 minutes
  • The prompt payload itself is not especially large (roughly a JSON structure with ~22 fields)

Questions:

  1. Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?
  2. Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?
  3. Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?
  4. Could this behavior be region-related or tied to current capacity/performance limitations?
  5. Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Any recommendations would be greatly appreciated since the current MED configuration does not meet the quality/performance level we were previously achieving with HIGH. This iss__We are currently using GPT-5-mini inside Azure AI Foundry Agents for an enterprise agent workflow that is in development/UAT, with a planned production launch in the next 1–2 months.__

Previously, we were running the model using HIGH reasoning effort and were getting significantly better output quality and overall performance. However, we started experiencing severe latency issues that eventually caused the agent process to time out consistently.

We opened a Microsoft support ticket and, after discussing the issue with a Microsoft engineer, were advised to reduce the reasoning effort from HIGH to MED. This change resolved the timeout issue, but introduced two new problems:

  • Lower output quality
  • Slower response times (approximately 1.5 minutes slower in our scenario)

Our goal is to understand whether there is any way to continue using GPT-5-mini with HIGH reasoning effort without hitting these timeout limitations.

Environment details:

  • Azure AI Foundry Agents
  • GPT-5-mini
  • East US 2
  • Python-based orchestrated container backend
  • Sequential multi-agent workflow
  • The failing agent is step 3 of the orchestration
  • Single-request testing (not concurrent load)
  • Using the latest Azure SDK/API version
  • Waiting for full completion (not streaming)
  • Retry logic configured with exponential backoff (8 retries starting at 5 seconds)

Additional observations:

  • The issue reproduces consistently
  • The timeout occurs both from our backend and directly within the Foundry portal
  • The timeout occurs after approximately 10 minutes
  • The prompt payload itself is not especially large (roughly a JSON structure with ~22 fields)

Questions:

  1. Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?
  2. Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?
  3. Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?
  4. Could this behavior be region-related or tied to current capacity/performance limitations?
  5. Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Any recommendations would be greatly appreciated since the current MED configuration does not meet the quality/performance level we were previously achieving with HIGH. This iss

Azure OpenAI in Foundry Models

Answer accepted by question author

Anshika Varshney 12,115 Reputation points Microsoft External Staff Moderator
2026-05-08T04:26:47.5366667+00:00

Hi Elissa,

Thanks for sharing the details. I can see that the timeout is happening when using GPT‑5‑mini with HIGH reasoning effort in Azure AI Foundry Agents.

This behavior is expected in some cases because higher reasoning effort increases processing time, and in multi-step agent workflows the request may exceed the allowed execution time. [ai.azure.com]

Below are some practical steps and examples to help you troubleshoot:

1. Understand why timeout is happening

  • HIGH reasoning mode takes more time because the model is doing deeper analysis
  • Multi-agent workflows increase total execution time
  • If the response takes too long, it can hit timeout limits

Also, synchronous agent or tool calls must complete within a limited time window, otherwise they fail with timeout.

2. Common timeout causes (with examples)

Here are some common real-world causes:

Long reasoning tasks Example: using HIGH reasoning for complex analysis or multi-step logic

Large input payload Example: sending big JSON with many fields or large context

Large output generation Example: generating long responses or detailed reports

Multi-agent orchestration delays Example: step 3 agent waits on step 2 output which itself is slow

Slow external tools or APIs Example: agent calling search API or database that takes time

Waiting for full response instead of partial output Example: backend waits for complete response before returning

High latency or region load Example: same request works sometimes but times out during peak load

Timeout can also happen due to network latency or service overload in backend calls.

3. Enable streaming response (very important)

Right now you are waiting for full completion. This increases timeout risk.

What streaming does:

  • Returns output in chunks as soon as it is generated
  • User sees partial response immediately
  • Reduces perceived timeout issues

How to enable (general approach):

  • If using SDK
    • Enable streaming option in response call
      • Example concept stream = true
      • If using API
        • Use streaming-supported endpoint or parameter
        • If using UI or Playground
          • Turn on streaming or real-time output option

Streaming is recommended for long responses because it improves responsiveness while the model is still processing. (Streaming responses are supported for real-time interaction in Foundry agents.) [drafts.cod...thme.cloud]

4. Reduce reasoning effort strategically

Instead of using HIGH everywhere:

  • Use MED or LOW for most steps
  • Use HIGH only where deep reasoning is needed
  • Combine models if needed (fast + reasoning mix)

This helps balance quality and performance.

5. Break large tasks into smaller steps

Avoid doing everything in one request.

Example:

  • Step 1: preprocess input
  • Step 2: generate intermediate output
  • Step 3: final reasoning

This keeps each step within execution limits.

6. Optimize token usage

Latency depends on tokens processed.

  • Reduce unnecessary input data
  • Avoid repeating context
  • Limit response length

More tokens mean more processing time.

7. Check slow steps in workflow

If your agent uses tools or sub-agents:

  • Identify which step is slow
  • Check logs or traces
  • Optimize that specific part

Often timeout is caused by one slow dependency.

8. Check region and performance factors

  • Try another region if possible
  • Monitor if issue is consistent or intermittent
  • Check if happens in both portal and backend

In short: This timeout happens mainly due to:

  • high reasoning effort
  • long-running workflows
  • large input or output
  • slow tool execution

Best approach:

  • use streaming responses
  • reduce reasoning where possible
  • break workflow into smaller steps

I Hope this helps. Do let me know if you have any further queries.

Thankyou!

Was this answer helpful?

2 people found this answer helpful.
0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.