Hello Jiaqi Zhang,
Welcome to Microsoft Q&A and Thank you for reaching out.
The OpenAI performance improvement does not get adopted by Azure OpenAI immediately. Azure rolls out model optimizations on its own schedule, after additional validation, compliance checks, and regional deployment.
What the OpenAI announcement actually means
The post from @OpenAIDevs is referring to:
OpenAI-hosted API infrastructure
Optimizations in their inference stack
Same model weights, but faster runtime execution
This improvement applies first to:
api.openai.com
OpenAI-managed endpoints
It does not automatically or instantly apply to Azure OpenAI / Azure AI Foundry.
How Azure OpenAI adopts model improvements
Azure OpenAI is a separate managed service that:
Runs models on Azure-owned infrastructure
Has additional layers for:
Security & compliance
Capacity management
Regional isolation
Enterprise SLAs
Because of this, model optimizations are onboarded gradually, and only after:
Internal validation by Azure
Performance and stability testing
Region-by-region rollout
SKU / deployment compatibility checks
There is no public ETA announced for when a specific OpenAI optimization lands in Azure.
Why you’re seeing >2-minute latency
From the graph you shared (“Time to last byte”), the spikes strongly suggest a combination of:
High regional load
Queueing under rate limits
Non-streaming responses
Possibly Data Zone Standard or constrained regional capacity
This is not unusual for newly released, high-demand models like GPT-5.2 in Azure, especially shortly after public announcements.
Clarification
Deploying GPT-5.2 in Azure AI Foundry does not guarantee parity with OpenAI API latency, even if the model name and version are the same.
Same model ≠ same runtime stack.
What you can do today to reduce latency
While waiting for Azure to adopt the optimization:
Enable streaming responses (biggest UX win)
Reduce max_tokens where possible
Lower concurrency spikes (smooth traffic)
Check region capacity and test another supported region
Monitor Time to First Token (TTFT) vs Time to Last Byte
Watch for throttling in Azure Metrics / Logs.
I Hope this helps. Do let me know if you have any further queries.
Thank you!