gpt-4o-mini — Unexplained latency degradation since May 22, both East US and Sweden Central

Question

gpt-4o-mini — Unexplained latency degradation since May 22, both East US and Sweden Central

Oron Karmona 35

Description:

We are experiencing a significant and sustained increase in response latency for our gpt-4o-mini deployment starting May 22, 2026. The degradation is observed simultaneously in both East US and Sweden Central regions.

Observed metrics (Azure Monitor — Azure OpenAI resource):

┌───────────────────────────────┬──────────────────┬──────────────────────────┐

│ Metric │ Before May 22 │ After May 22 │

├───────────────────────────────┼──────────────────┼──────────────────────────┤

│ Time to first byte │ ~1 ms (stable) │ 4–12 ms (noisy, spiking) │

├───────────────────────────────┼──────────────────┼──────────────────────────┤

│ Time to last byte │ ~198 ms (stable) │ 800–2,376 ms │

├───────────────────────────────┼──────────────────┼──────────────────────────┤

│ Number of requests │ Unchanged │ Unchanged │

├───────────────────────────────┼──────────────────┼──────────────────────────┤

│ Token volume (input + output) │ Unchanged │ Unchanged │

└───────────────────────────────┴──────────────────┴──────────────────────────┘

Key observations:

Change point is clearly May 22 — flat baseline before, degraded after
Request volume and token counts are identical before and after, ruling out load increase
TTFB increased alongside TTLB — this is not an output-length issue; Azure is slower to begin responding
Both regions degraded simultaneously — rules out regional infrastructure
No 429s or error-rate increase — not a throttling/rate-limit issue
LiteLLM proxy latency was validated as a pass-through; the latency is Azure-side

What we need to know:

Was any backend infrastructure, routing, or model version change applied to gpt-4o-mini deployments around May 22, 2026?
Is the gpt-4o-mini deployment still serving gpt-4o-mini-2024-07-18 as the underlying model version, or was it silently migrated?
Is there an active service-side issue affecting request routing or backend handling for this model, similar to the GPT-4.1-mini routing incident from February 2026?

Deployment details:

Model: gpt-4o-mini

gpt-4o-mini is a production model serving end-user requests. Response time has increased 5–12x, directly degrading user experience.

Thank you

Jérôme CAMPO - Betclic 20 Reputation points

2026-05-26T15:52:38.6733333+00:00
Hello

Same here on EU Datazone: gpt-4o-mini TTFB from 50ms to almost a second, and TTLB from 500ms to 8 sec !!!! Same behavior for gpt-4.1-mini and gpt-4.1 !!!

Any know issue ongoing ?

Any way to revert on stability of the previous months ?

Do Azure face in EU a sudden spike in demand for gpt-4o mini (and gpt-4.1 families) ?

Or is it a side effect of a silent hardware decommissioning toward EOL support of October 2026 ?

Thanks for feedback, gpt-4o-mini rock so much !!!
ASG 10 Reputation points

2026-05-26T20:17:35.27+00:00

Hello,

We are seeing the same latency issues with gpt 4.0-mini in eus and frc/swc regions.
Matthieu Delanoë 20 Reputation points

2026-05-27T07:06:59.8+00:00

Hi!
Same here on our gpt-4o-mini deployment in Europe, since May 22nd, about 5-6 times slower than before.
Our app relying on this became unusable.

Any news on this @Microsoft?
Steve W 15 Reputation points

2026-05-27T08:04:41.4666667+00:00

Same here - our gpt-4o-mini requests (UK South) which have always responded in <1 second for the past year or more are now taking 7-15 seconds. This is plainly unacceptable for chatbot performance!

How can we get back to previous performance levels??
Jacques SALIOU 20 Reputation points

2026-05-27T08:10:38.2966667+00:00

We are experiencing the exact same issue on our side with GPT-4o-mini since last Friday.

Response times have increased significantly compared to the usual behavior, and this is impacting workflows currently running in production environments.

Would appreciate any feedback or clarification from Microsoft / Azure / Azure AI Foundry teams regarding potential incidents, regional degradation, throttling, or ongoing mitigations.
Autofilljobs 0 Reputation points

2026-05-27T17:23:51.93+00:00

Hi all,

has anyone figured out a solution. Did changing the model help?

did creating a ticket with azure help?
Amal Jose Vallavanthara 0 Reputation points

2026-05-29T19:14:06.79+00:00

Hi All,
How did you solve it?
Is it stable now - back to normal speed? we had swithced to a different model - 5 series. It got faster. But it so much more expensive.
Marian Vesper 5 Reputation points

2026-05-30T08:04:07.44+00:00

We are experiencing the same issue in germany west central and sweden central with 4o-mini and 5-mini deployments. Latency seemed fine on 28.05. but increased drastically again on 29.05.
Did anyone resolve the issue or got any response/reaction from Microsoft?
Issue occured first on 26.05.
Amal Jose Vallavanthara 0 Reputation points

2026-05-30T08:11:59.82+00:00

@Marian Vesper We are facing the same issue. It is unusable. we contacted azure but no reply. We moved to 5-mini and the latency is fine. We were facing issue with 4 series only.

2 answers

Your answer

Jérôme CAMPO - Betclic 20 Reputation points

2026-05-26T15:52:38.6733333+00:00

Hello

Same here on EU Datazone: gpt-4o-mini TTFB from 50ms to almost a second, and TTLB from 500ms to 8 sec !!!! Same behavior for gpt-4.1-mini and gpt-4.1 !!!

Any know issue ongoing ?

Any way to revert on stability of the previous months ?

Do Azure face in EU a sudden spike in demand for gpt-4o mini (and gpt-4.1 families) ?

Or is it a side effect of a silent hardware decommissioning toward EOL support of October 2026 ?

Thanks for feedback, gpt-4o-mini rock so much !!!
ASG 10 Reputation points

2026-05-26T20:17:35.27+00:00

Hello,

We are seeing the same latency issues with gpt 4.0-mini in eus and frc/swc regions.
Matthieu Delanoë 20 Reputation points

2026-05-27T07:06:59.8+00:00

Hi!
Same here on our gpt-4o-mini deployment in Europe, since May 22nd, about 5-6 times slower than before.
Our app relying on this became unusable.

Any news on this @Microsoft?
Steve W 15 Reputation points

2026-05-27T08:04:41.4666667+00:00

Same here - our gpt-4o-mini requests (UK South) which have always responded in <1 second for the past year or more are now taking 7-15 seconds. This is plainly unacceptable for chatbot performance!

How can we get back to previous performance levels??
Jacques SALIOU 20 Reputation points

2026-05-27T08:10:38.2966667+00:00

We are experiencing the exact same issue on our side with GPT-4o-mini since last Friday.

Response times have increased significantly compared to the usual behavior, and this is impacting workflows currently running in production environments.

Would appreciate any feedback or clarification from Microsoft / Azure / Azure AI Foundry teams regarding potential incidents, regional degradation, throttling, or ongoing mitigations.
Autofilljobs 0 Reputation points

2026-05-27T17:23:51.93+00:00

Hi all,

has anyone figured out a solution. Did changing the model help?

did creating a ticket with azure help?
Amal Jose Vallavanthara 0 Reputation points

2026-05-29T19:14:06.79+00:00

Hi All,
How did you solve it?
Is it stable now - back to normal speed? we had swithced to a different model - 5 series. It got faster. But it so much more expensive.
Marian Vesper 5 Reputation points

2026-05-30T08:04:07.44+00:00

We are experiencing the same issue in germany west central and sweden central with 4o-mini and 5-mini deployments. Latency seemed fine on 28.05. but increased drastically again on 29.05.
Did anyone resolve the issue or got any response/reaction from Microsoft?
Issue occured first on 26.05.
Amal Jose Vallavanthara 0 Reputation points

2026-05-30T08:11:59.82+00:00

@Marian Vesper We are facing the same issue. It is unusable. we contacted azure but no reply. We moved to 5-mini and the latency is fine. We were facing issue with 4 series only.

Answer 1

Matthieu Delanoë 20

Hi all,

Today latency seems to be back to normal on our deployments.

But still no answer/confirmation from Microsoft on this

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Amal Jose Vallavanthara 0 Reputation points

2026-05-29T19:13:24.87+00:00

Hi Matthieu,

Is it stable now - back to normal speed? we had swithced to a different model - 5 series. It got faster. But it so much more expensive.

Answer 2

The context does not contain any information about backend incidents, routing changes, or model-version changes for gpt‑4o‑mini on or around May 22, 2026, nor any confirmation of a service-side issue similar to a GPT‑4.1‑mini routing incident.

The only supported guidance from the context that applies to latency issues is general performance tuning and monitoring:

Use Azure Monitor to validate throughput and latency
- Track:
  - Calls per minute using the Azure OpenAI Requests metric split by ModelDeploymentName.
  - Total tokens per minute using the Processed Inference tokens metric.
- This confirms whether the service is processing the same volume and whether any throttling or capacity effects show up in metrics.
Apply the documented latency optimization levers where possible
- Model choice: If latency is critical and the workload allows, test GPT‑4o mini versus other available models to see if another model in the same region provides better latency characteristics.
- Lower max_tokens: Even when actual output length is similar, a higher max_tokens setting increases latency.
- Reduce total tokens: Shorter prompts and responses reduce end‑to‑end latency.
- Enable streaming: Streaming does not reduce total processing time but improves perceived latency by returning tokens as they are generated.
- Content filtering configuration: Content filters improve safety but add latency; where appropriate and compliant, evaluate whether modified content filtering policies are suitable.
Next steps for unexplained step‑change latency
- Because the context explicitly states that model availability and behavior are centrally managed and cannot be influenced or customized per customer or per region, and there is no published incident or change record in the provided material, any investigation of a specific date‑bound regression (like May 22, 2026) must go through Azure support.
- Open a support ticket with:
  - Exact resource name and regions.
  - Deployment name(s) for gpt‑4o‑mini.
  - Time window showing the change (before/after May 22) and exported Azure Monitor charts for latency and tokens.
  - Example request IDs and timestamps for slow vs. normal calls.
- The product group can then check for:
  - Backend routing or capacity changes affecting those regions.
  - Model implementation or configuration changes.
  - Any active or historical incidents not reflected in public documentation.

The context does not provide any statement that gpt‑4o‑mini has been silently migrated to a different underlying version, nor any confirmation of a known service‑side issue on the date described. Only Azure support can confirm subscription‑ and time‑specific backend behavior.

References:

Share via

gpt-4o-mini — Unexplained latency degradation since May 22, both East US and Sweden Central

2 answers

Your answer