An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Hi all,
Today latency seems to be back to normal on our deployments.
But still no answer/confirmation from Microsoft on this
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Description:
We are experiencing a significant and sustained increase in response latency for our gpt-4o-mini deployment starting May 22, 2026. The degradation is observed simultaneously in both East US and Sweden Central regions.
Observed metrics (Azure Monitor — Azure OpenAI resource):
┌───────────────────────────────┬──────────────────┬──────────────────────────┐
│ Metric │ Before May 22 │ After May 22 │
├───────────────────────────────┼──────────────────┼──────────────────────────┤
│ Time to first byte │ ~1 ms (stable) │ 4–12 ms (noisy, spiking) │
├───────────────────────────────┼──────────────────┼──────────────────────────┤
│ Time to last byte │ ~198 ms (stable) │ 800–2,376 ms │
├───────────────────────────────┼──────────────────┼──────────────────────────┤
│ Number of requests │ Unchanged │ Unchanged │
├───────────────────────────────┼──────────────────┼──────────────────────────┤
│ Token volume (input + output) │ Unchanged │ Unchanged │
└───────────────────────────────┴──────────────────┴──────────────────────────┘
Key observations:
What we need to know:
Deployment details:
gpt-4o-mini is a production model serving end-user requests. Response time has increased 5–12x, directly degrading user experience.
Thank you
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Hi all,
Today latency seems to be back to normal on our deployments.
But still no answer/confirmation from Microsoft on this
AI answer
The context does not contain any information about backend incidents, routing changes, or model-version changes for gpt‑4o‑mini on or around May 22, 2026, nor any confirmation of a service-side issue similar to a GPT‑4.1‑mini routing incident.
The only supported guidance from the context that applies to latency issues is general performance tuning and monitoring:
ModelDeploymentName.max_tokens: Even when actual output length is similar, a higher max_tokens setting increases latency.The context does not provide any statement that gpt‑4o‑mini has been silently migrated to a different underlying version, nor any confirmation of a known service‑side issue on the date described. Only Azure support can confirm subscription‑ and time‑specific backend behavior.
References: