Share via

When the AI foundry adopt GPT-5.2 latest version (40% faster)?

Jiaqi Zhang 0 Reputation points Microsoft Employee
2026-02-10T02:43:53.2+00:00

I deployed GPT-5.2 model in Azure AI Foundry, but I found the GPT-5.2 is so slow that the response time could be greater than 2 minutesUser's image
The OpenAI Developers official account said they have optimized the model and now it's 40% faster, I'm wondering when the Azure OpenAI Service adopt this change?User's image
https://x.com/OpenAIDevs/status/2018838297221726482

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.

{count} votes

2 answers

Sort by: Most helpful
  1. Alex Burlachenko 19,615 Reputation points Volunteer Moderator
    2026-02-10T12:47:30.5333333+00:00

    hi Jiaqi Zhang,

    try review and reduce unnecessary context in your prompts, and if possible test another region. If latency is business critical, you may need to temporarily use a different model option or the public OpenAI API, provided that fits your compliance requirements. 40 percent faster announcement does not imply an immediate speed up in Azure. Azure OpenAI updates are delayed and rolled out gradually. There is no announced date for when this optimisation will reach Azure AI Foundry.

    rgds,

    Alex

    1 person found this answer helpful.
    0 comments No comments

  2. SRILAKSHMI C 15,040 Reputation points Microsoft External Staff Moderator
    2026-02-10T13:20:29.43+00:00

    Hello Jiaqi Zhang,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    The OpenAI performance improvement does not get adopted by Azure OpenAI immediately. Azure rolls out model optimizations on its own schedule, after additional validation, compliance checks, and regional deployment.

    What the OpenAI announcement actually means

    The post from @OpenAIDevs is referring to:

    OpenAI-hosted API infrastructure

    Optimizations in their inference stack

    Same model weights, but faster runtime execution

    This improvement applies first to:

    api.openai.com

    OpenAI-managed endpoints

    It does not automatically or instantly apply to Azure OpenAI / Azure AI Foundry.

    How Azure OpenAI adopts model improvements

    Azure OpenAI is a separate managed service that:

    Runs models on Azure-owned infrastructure

    Has additional layers for:

    Security & compliance

      Capacity management
      
         Regional isolation
         
            Enterprise SLAs
            
    

    Because of this, model optimizations are onboarded gradually, and only after:

    Internal validation by Azure

    Performance and stability testing

    Region-by-region rollout

    SKU / deployment compatibility checks

    There is no public ETA announced for when a specific OpenAI optimization lands in Azure.

    Why you’re seeing >2-minute latency

    From the graph you shared (“Time to last byte”), the spikes strongly suggest a combination of:

    High regional load

    Queueing under rate limits

    Non-streaming responses

    Possibly Data Zone Standard or constrained regional capacity

    This is not unusual for newly released, high-demand models like GPT-5.2 in Azure, especially shortly after public announcements.

    Clarification

    Deploying GPT-5.2 in Azure AI Foundry does not guarantee parity with OpenAI API latency, even if the model name and version are the same.

    Same model ≠ same runtime stack.

    What you can do today to reduce latency

    While waiting for Azure to adopt the optimization:

    Enable streaming responses (biggest UX win)

    Reduce max_tokens where possible

    Lower concurrency spikes (smooth traffic)

    Check region capacity and test another supported region

    Monitor Time to First Token (TTFT) vs Time to Last Byte

    Watch for throttling in Azure Metrics / Logs.

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.