GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance

Question

GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance

Elissa Castellanos (NEXTANT LLC) 0 Microsoft External Staff

We are currently using GPT-5-mini inside Azure AI Foundry Agents for an enterprise agent workflow that is in development/UAT, with a planned production launch in the next 1–2 months.

Previously, we were running the model using HIGH reasoning effort and were getting significantly better output quality and overall performance. However, we started experiencing severe latency issues that eventually caused the agent process to time out consistently.

We opened a Microsoft support ticket and, after discussing the issue with a Microsoft engineer, were advised to reduce the reasoning effort from HIGH to MED. This change resolved the timeout issue, but introduced two new problems:

Lower output quality
Slower response times (approximately 1.5 minutes slower in our scenario)

Our goal is to understand whether there is any way to continue using GPT-5-mini with HIGH reasoning effort without hitting these timeout limitations.

Environment details:

Azure AI Foundry Agents
GPT-5-mini
East US 2
Python-based orchestrated container backend
Sequential multi-agent workflow
The failing agent is step 3 of the orchestration
Single-request testing (not concurrent load)
Using the latest Azure SDK/API version
Waiting for full completion (not streaming)
Retry logic configured with exponential backoff (8 retries starting at 5 seconds)

Additional observations:

The issue reproduces consistently
The timeout occurs both from our backend and directly within the Foundry portal
The timeout occurs after approximately 10 minutes
The prompt payload itself is not especially large (roughly a JSON structure with ~22 fields)

Questions:

Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?
Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?
Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?
Could this behavior be region-related or tied to current capacity/performance limitations?
Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Any recommendations would be greatly appreciated since the current MED configuration does not meet the quality/performance level we were previously achieving with HIGH. This iss__We are currently using GPT-5-mini inside Azure AI Foundry Agents for an enterprise agent workflow that is in development/UAT, with a planned production launch in the next 1–2 months.__

Previously, we were running the model using HIGH reasoning effort and were getting significantly better output quality and overall performance. However, we started experiencing severe latency issues that eventually caused the agent process to time out consistently.

We opened a Microsoft support ticket and, after discussing the issue with a Microsoft engineer, were advised to reduce the reasoning effort from HIGH to MED. This change resolved the timeout issue, but introduced two new problems:

Lower output quality
Slower response times (approximately 1.5 minutes slower in our scenario)

Our goal is to understand whether there is any way to continue using GPT-5-mini with HIGH reasoning effort without hitting these timeout limitations.

Environment details:

Azure AI Foundry Agents
GPT-5-mini
East US 2
Python-based orchestrated container backend
Sequential multi-agent workflow
The failing agent is step 3 of the orchestration
Single-request testing (not concurrent load)
Using the latest Azure SDK/API version
Waiting for full completion (not streaming)
Retry logic configured with exponential backoff (8 retries starting at 5 seconds)

Additional observations:

The issue reproduces consistently
The timeout occurs both from our backend and directly within the Foundry portal
The timeout occurs after approximately 10 minutes
The prompt payload itself is not especially large (roughly a JSON structure with ~22 fields)

Questions:

Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?
Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?
Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?
Could this behavior be region-related or tied to current capacity/performance limitations?
Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Any recommendations would be greatly appreciated since the current MED configuration does not meet the quality/performance level we were previously achieving with HIGH. This iss

Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-08T04:26:47.5366667+00:00
Hi Elissa,

Thanks for sharing the details. I can see that the timeout is happening when using GPT‑5‑mini with HIGH reasoning effort in Azure AI Foundry Agents.

This behavior is expected in some cases because higher reasoning effort increases processing time, and in multi-step agent workflows the request may exceed the allowed execution time. [ai.azure.com]

Below are some practical steps and examples to help you troubleshoot:

1. Understand why timeout is happening

HIGH reasoning mode takes more time because the model is doing deeper analysis

Multi-agent workflows increase total execution time

If the response takes too long, it can hit timeout limits

Also, synchronous agent or tool calls must complete within a limited time window, otherwise they fail with timeout.

2. Common timeout causes (with examples)

Here are some common real-world causes:

Long reasoning tasks Example: using HIGH reasoning for complex analysis or multi-step logic

Large input payload Example: sending big JSON with many fields or large context

Large output generation Example: generating long responses or detailed reports

Multi-agent orchestration delays Example: step 3 agent waits on step 2 output which itself is slow

Slow external tools or APIs Example: agent calling search API or database that takes time

Waiting for full response instead of partial output Example: backend waits for complete response before returning

High latency or region load Example: same request works sometimes but times out during peak load

Timeout can also happen due to network latency or service overload in backend calls.

3. Enable streaming response (very important)

Right now you are waiting for full completion. This increases timeout risk.

What streaming does:

Returns output in chunks as soon as it is generated

User sees partial response immediately

Reduces perceived timeout issues

How to enable (general approach):

If using SDK

Enable streaming option in response call

Example concept stream = true

If using API

Use streaming-supported endpoint or parameter

If using UI or Playground

Turn on streaming or real-time output option

Streaming is recommended for long responses because it improves responsiveness while the model is still processing. (Streaming responses are supported for real-time interaction in Foundry agents.) [drafts.cod...thme.cloud]

4. Reduce reasoning effort strategically

Instead of using HIGH everywhere:

Use MED or LOW for most steps

Use HIGH only where deep reasoning is needed

Combine models if needed (fast + reasoning mix)

This helps balance quality and performance.

5. Break large tasks into smaller steps

Avoid doing everything in one request.

Example:

Step 1: preprocess input

Step 2: generate intermediate output

Step 3: final reasoning

This keeps each step within execution limits.

6. Optimize token usage

Latency depends on tokens processed.

Reduce unnecessary input data

Avoid repeating context

Limit response length

More tokens mean more processing time.

7. Check slow steps in workflow

If your agent uses tools or sub-agents:

Identify which step is slow

Check logs or traces

Optimize that specific part

Often timeout is caused by one slow dependency.

8. Check region and performance factors

Try another region if possible

Monitor if issue is consistent or intermittent

Check if happens in both portal and backend

In short: This timeout happens mainly due to:

high reasoning effort

long-running workflows

large input or output

slow tool execution

Best approach:

use streaming responses

reduce reasoning where possible

break workflow into smaller steps

I Hope this helps. Do let me know if you have any further queries.

Thankyou!
Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-11T03:30:59.44+00:00

Hi Elissa,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thankyou!
Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-12T08:34:54.5466667+00:00

Hi Elissa,

Just checking back to see if you’re still facing the same issue. If the problem persists, please share a few more details and we’ll be happy to help you further.

Thankyou!
Elissa Castellanos (NEXTANT LLC) 0 Reputation points Microsoft External Staff

2026-05-12T19:52:09.2133333+00:00

Hello @Anshika Varshney , hope you are doing well. These are the actions we have been able to take based on your previous feedback:

1. Understand why timeout is happening

This has been difficult for us since timeouts are happening directly in foundry portal using the agent as a single agent. So we don't believe it is related to the multi-step orchestration.

2. Common timeout causes (with examples)

Our input is generally a JSON file that is not extremely long and as mentioned before, it is only causing timeouts with the reasoning effort set to HIGH. The output it should generate is a JSON that is much shorter than the input. In our case it is sequential, but the agents processing doesn't begin until the previous one is finished. We are not calling any external tools and have isolated the issue to the agent that has the gpt-5-mini.

3. Enable streaming response (very important)

We looked for this option within the SDK and were not able to enable it. Could you provide further guidance?

4. Reduce reasoning effort strategically

Our agent is the last step of the process, and as mentioned before, using MED or LOW reasoning effort has reduced our quality significantly.

5. Break large tasks into smaller steps

This is already happening in our workstream. The agent with gpt-5-mini is the last one in the process in charge.

6. Optimize token usage

There is only one call to this agent, so there is no repeated context given anywhere, it's the first and only call we make per process.

7. Check slow steps in workflow

There are no external tools used, the agent is timing out without dependencies.

8. Check region and performance factors

Changing regions is not possible for us because we have many other agents hosted in this same region and subscription.

If you have any other recommendations on how we can troubleshoot this issue it would be highly appreciated.

We also wanted to know, is there any way we can keep ourselves informed about recent changes in foundry models or other updates on this tool? We would like to be able to mitigate the risks of these changes within models affecting our workflow as much as possible.

Thanks again for your help.
Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-13T12:08:42.16+00:00
Thank you for the detailed update, this really helps to narrow things down. I understand this is happening even with a single agent in the portal and only when using high reasoning, which makes this a bit different from typical timeout scenarios.

Based on what you shared, here are a few additional things you can try:

About timeouts with high reasoning

With high reasoning, the model takes more time internally to process and generate the answer.

Even if your input and output are not very large, the internal reasoning steps can still be heavy, which can lead to timeouts in the portal itself.

Since you already confirmed this happens only with high reasoning, this strongly suggests it is computing time related rather than input size or workflow issues.

Try reducing max output tokens

Even if your expected output is small, the model may still reserve space for larger responses.

You can try:

explicitly setting a lower max tokens value

for example around 200 to 500 depending on your output

This helps reduce overall processing time and can avoid timeouts.

Simplify the instruction slightly

Even small changes in instructions can impact reasoning time.

You can try:

making the instruction more direct

removing unnecessary explanation or context

guiding the model to produce structured output faster

This helps the model spend less time in reasoning loops.

Streaming support clarification

Right now, streaming is not always exposed directly in the Foundry portal experience.

Streaming is typically used in API or SDK calls, not in the portal UI itself.

So, if you are testing inside the portal, this is expected that you do not see a streaming option there.

Try testing via API

If possible, try calling the same deployment using API instead of only the portal.

This helps confirm:

whether the timeout is UI related

or truly coming from the model execution

Sometimes portal limits are stricter compared to API calls.

Check token usage in practice

Even if you believe input is small, it is helpful to verify:

full JSON size in tokens

hidden system instructions added by the agent

any formatting or schema instructions

These can silently increase total tokens and processing time.

Model behavior note

For GPT-5 mini and similar models:

high reasoning mode increases latency significantly

this is expected behavior for complex reasoning tasks

some scenarios may hit time limits even with moderate input

About staying updated on changes

This is a great question.

You can track updates here:

Azure AI Foundry documentation

https://ai.azure.com/catalog/models

These pages are updated regularly with new features and model changes.

I hope this helps you get back on track! If you're still facing issues, could you share more details?

Thankyou!
Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-14T05:20:40.7666667+00:00

Hi Elissa,

Please let me know if the issue persists after these checks. If you have any remaining questions or need additional details, I’ll be glad to provide further clarification or guidance. If the above steps resolve your issue, kindly confirm.

1 answer

Your answer

Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-11T03:30:59.44+00:00

Hi Elissa,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thankyou!
Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-12T08:34:54.5466667+00:00

Hi Elissa,

Just checking back to see if you’re still facing the same issue. If the problem persists, please share a few more details and we’ll be happy to help you further.

Thankyou!
Anshika Varshney 11,225 Reputation points Microsoft External Staff Moderator

2026-05-14T05:20:40.7666667+00:00

Hi Elissa,

Please let me know if the issue persists after these checks. If you have any remaining questions or need additional details, I’ll be glad to provide further clarification or guidance. If the above steps resolve your issue, kindly confirm.

Answer 1

Hello Elissa Castellanos (NEXTANT LLC),

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance.

Let me start with your questions first:

Are there known limitations or timeout constraints when using GPT-5-mini with HIGH reasoning effort in Azure AI Foundry Agents?

There is official evidence that reasoning models spend more time processing, and higher reasoning effort increases processing time and hidden reasoning-token usage. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning, https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/runtime-components For tool runs specifically, Microsoft documents a 10-minute run expiration for tool-output submission, use the right tools, so that exact tool-output rule should not be falsely presented as the confirmed cause. - https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/tools/function-calling
Is HIGH reasoning expected to have substantially longer latency in agent orchestrations?

Yes. Higher reasoning effort makes reasoning models spend longer processing the request and generally increases hidden reasoning-token usage. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning
Are there recommended configurations, architectural patterns, or timeout settings that would allow HIGH reasoning to complete successfully?

Yes. Use background responses with polling/resumption and optionally streaming with resumption. This is the recommended production pattern for long-running reasoning tasks and client timeout resilience. - https://learn.microsoft.com/en-us/agent-framework/agents/background-responses, https://devblogs.microsoft.com/agent-framework/handling-long-running-operations-with-background-responses/
Could this behavior be region-related or tied to current capacity/performance limitations?

Yes, but it must be proven. Quotas and limits are regional, per subscription, model, and deployment type, and Standard/Global/Data Zone deployments can experience latency variability under capacity pressure. Use Azure Monitor latency/token metrics and 429/capacity indicators to validate. - https://learn.microsoft.com/en-us/azure/foundry/openai/quotas-limits, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/quota
Is there guidance on when HIGH reasoning should or should not be used in multi-step agent workflows?

Use HIGH only on the smallest, highest-value reasoning step where quality materially depends on it. For latency-sensitive or production workflows, run that HIGH step asynchronously/backgrounded rather than synchronously. Reasoning models are best for complex problem-solving, document comparison, coding, and workflow-management tasks, but latency/cost must be budgeted. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/latency

So, regarding your clarification, the timeout does not appear to come from multi-agent orchestration, external tools, or payload size. Since it also occurs in the Foundry portal with the agent isolated, the failing point is most likely the GPT-5-mini HIGH-reasoning generation step itself. - https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/runtime-components, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/reasoning

HIGH reasoning can take much longer because reasoning models spend extra processing time and use hidden reasoning tokens before producing the final answer. The production-safe fix is to run this step asynchronously using background responses with polling/resumption, rather than waiting synchronously for completion.

Test the same agent, input, model, and deployment using the Azure OpenAI Responses/Foundry Agent background pattern, for example: response = client.responses.create(..., background=True), then poll with client.responses.retrieve(response.id). If it completes there, use that pattern as the production architecture. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses, https://devblogs.microsoft.com/agent-framework/handling-long-running-operations-with-background-responses/

If it still fails at about 10 minutes, capture the response/request ID, timestamp, model version, deployment type, Azure Monitor latency/token metrics, and error details, then continue with Microsoft Support as a service or deployment-capacity issue. Also review quota/capacity, consider Provisioned Throughput for predictable latency, and monitor Foundry model/version updates. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/latency, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/quota, https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/model-versions

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

GPT-5-mini in Azure AI Foundry Agents times out in HIGH reasoning mode but MED has significantly lower quality and slower performance

1 answer

Your answer