Share via

GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance

Elissa Castellanos (NEXTANT LLC) 0 Reputation points Microsoft External Staff
2026-05-08T00:27:18.7833333+00:00

We are currently using GPT-5-mini inside Azure AI Foundry Agents for an enterprise agent workflow that is in development/UAT, with a planned production launch in the next 1–2 months.

Previously, we were running the model using HIGH reasoning effort and were getting significantly better output quality and overall performance. However, we started experiencing severe latency issues that eventually caused the agent process to time out consistently.

We opened a Microsoft support ticket and, after discussing the issue with a Microsoft engineer, were advised to reduce the reasoning effort from HIGH to MED. This change resolved the timeout issue, but introduced two new problems:

  • Lower output quality
  • Slower response times (approximately 1.5 minutes slower in our scenario)

Our goal is to understand whether there is any way to continue using GPT-5-mini with HIGH reasoning effort without hitting these timeout limitations.

Environment details:

  • Azure AI Foundry Agents
  • GPT-5-mini
  • East US 2
  • Python-based orchestrated container backend
  • Sequential multi-agent workflow
  • The failing agent is step 3 of the orchestration
  • Single-request testing (not concurrent load)
  • Using the latest Azure SDK/API version
  • Waiting for full completion (not streaming)
  • Retry logic configured with exponential backoff (8 retries starting at 5 seconds)

Additional observations:

  • The issue reproduces consistently
  • The timeout occurs both from our backend and directly within the Foundry portal
  • The timeout occurs after approximately 10 minutes
  • The prompt payload itself is not especially large (roughly a JSON structure with ~22 fields)

Questions:

  1. Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?
  2. Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?
  3. Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?
  4. Could this behavior be region-related or tied to current capacity/performance limitations?
  5. Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Any recommendations would be greatly appreciated since the current MED configuration does not meet the quality/performance level we were previously achieving with HIGH. This iss__We are currently using GPT-5-mini inside Azure AI Foundry Agents for an enterprise agent workflow that is in development/UAT, with a planned production launch in the next 1–2 months.__

Previously, we were running the model using HIGH reasoning effort and were getting significantly better output quality and overall performance. However, we started experiencing severe latency issues that eventually caused the agent process to time out consistently.

We opened a Microsoft support ticket and, after discussing the issue with a Microsoft engineer, were advised to reduce the reasoning effort from HIGH to MED. This change resolved the timeout issue, but introduced two new problems:

  • Lower output quality
  • Slower response times (approximately 1.5 minutes slower in our scenario)

Our goal is to understand whether there is any way to continue using GPT-5-mini with HIGH reasoning effort without hitting these timeout limitations.

Environment details:

  • Azure AI Foundry Agents
  • GPT-5-mini
  • East US 2
  • Python-based orchestrated container backend
  • Sequential multi-agent workflow
  • The failing agent is step 3 of the orchestration
  • Single-request testing (not concurrent load)
  • Using the latest Azure SDK/API version
  • Waiting for full completion (not streaming)
  • Retry logic configured with exponential backoff (8 retries starting at 5 seconds)

Additional observations:

  • The issue reproduces consistently
  • The timeout occurs both from our backend and directly within the Foundry portal
  • The timeout occurs after approximately 10 minutes
  • The prompt payload itself is not especially large (roughly a JSON structure with ~22 fields)

Questions:

  1. Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?
  2. Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?
  3. Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?
  4. Could this behavior be region-related or tied to current capacity/performance limitations?
  5. Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Any recommendations would be greatly appreciated since the current MED configuration does not meet the quality/performance level we were previously achieving with HIGH. This iss

Azure OpenAI in Foundry Models

1 answer

Sort by: Most helpful
  1. Sina Salam 29,016 Reputation points Volunteer Moderator
    2026-05-13T16:42:19.3966667+00:00

    Hello Elissa Castellanos (NEXTANT LLC),

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance.

    Let me start with your questions first:

    1. Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?

      There is official evidence that reasoning models spend more time processing, and higher reasoning effort increases processing time and hidden reasoning-token usage. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning, https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/runtime-components For tool runs specifically, Microsoft documents a 10-minute run expiration for tool-output submission, use the right tools, so that exact tool-output rule should not be falsely presented as the confirmed cause. - https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/tools/function-calling
    2. Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?

      Yes. Higher reasoning effort makes reasoning models spend longer processing the request and generally increases hidden reasoning-token usage. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning
    3. Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?

      Yes. Use background responses with polling/resumption and optionally streaming with resumption. This is the recommended production pattern for long-running reasoning tasks and client timeout resilience. - https://learn.microsoft.com/en-us/agent-framework/agents/background-responses, https://devblogs.microsoft.com/agent-framework/handling-long-running-operations-with-background-responses/
    4. Could this behavior be region-related or tied to current capacity/performance limitations?

      Yes, but it must be proven. Quotas and limits are regional, per subscription, model, and deployment type, and Standard/Global/Data Zone deployments can experience latency variability under capacity pressure. Use Azure Monitor latency/token metrics and 429/capacity indicators to validate. - https://learn.microsoft.com/en-us/azure/foundry/openai/quotas-limits, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/quota
    5. Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

      Use HIGH only on the smallest, highest-value reasoning step where quality materially depends on it. For latency-sensitive or production workflows, run that HIGH step asynchronously/backgrounded rather than synchronously. Reasoning models are best for complex problem-solving, document comparison, coding, and workflow-management tasks, but latency/cost must be budgeted. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/latency

    So, regarding your clarification, the timeout does not appear to come from multi-agent orchestration, external tools, or payload size. Since it also occurs in the Foundry portal with the agent isolated, the failing point is most likely the GPT-5-mini HIGH-reasoning generation step itself. - https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/runtime-components, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning

    HIGH reasoning can take much longer because reasoning models spend extra processing time and use hidden reasoning tokens before producing the final answer. The production-safe fix is to run this step asynchronously using background responses with polling/resumption, rather than waiting synchronously for completion.

    Test the same agent, input, model, and deployment using the Azure OpenAI Responses/Foundry Agent background pattern, for example: response = client.responses.create(..., background=True), then poll with client.responses.retrieve(response.id). If it completes there, use that pattern as the production architecture. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses, https://devblogs.microsoft.com/agent-framework/handling-long-running-operations-with-background-responses/

    If it still fails at about 10 minutes, capture the response/request ID, timestamp, model version, deployment type, Azure Monitor latency/token metrics, and error details, then continue with Microsoft Support as a service or deployment-capacity issue. Also review quota/capacity, consider Provisioned Throughput for predictable latency, and monitor Foundry model/version updates. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/latency, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/quota, https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/model-versions

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.