Share via

GPT-5.2 model returning HTTP 400 errors on Azure AI Foundry

Tarandeep Singh Khurana 20 Reputation points
2026-03-11T06:34:49.39+00:00

Environment:

  • Azure AI Foundry (OpenAI endpoint)
  • Endpoint: https://<resource-name>.openai.azure.com/openai/v1/
  • Model: gpt-5.2
  • SDK: LangChain with OpenAI client

Issue: GPT-5.2 model deployment suddenly started returning HTTP 400 (Bad Request) errors for all requests. The model was working previously without any configuration changes on our end.

Error observed: HTTP 400 Bad Request

What we tried:

  • Verified API key and endpoint are correct
  • Confirmed the model deployment exists in Azure portal
  • Tested with same prompts that worked before
  • No changes to request payload format

Workaround: Migrated to gpt-5.4 which is working, but need clarity on:

  1. Is GPT-5.2 deprecated or experiencing an outage?
  2. Is there a known issue with GPT-5.2 on Azure AI Foundry?
  3. Expected timeline for resolution if this is a service issue

Impact: Production application was affected until we migrated to GPT-5.4.

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.


2 answers

Sort by: Most helpful
  1. SRILAKSHMI C 16,975 Reputation points Microsoft External Staff Moderator
    2026-03-17T10:04:17.2733333+00:00

    Hi Tarandeep Singh Khurana,

    Thank you for confirming the previous one, and the detailed analysis

    The 429 errors you’re seeing with GPT-5.2 and GPT-5.4 are not quota-related, as you correctly identified.

    Instead, they are caused by:

    • High demand on shared (Global Standard) capacity
    • Large per-request token size (~13K–15K tokens)
    • Dynamic service-side throttling during peak load

    During these periods, the service may reject larger requests first to protect overall system stability, even if your TPM quota is not fully utilized.

    Why this affects your scenario more

    Your workload includes:

    Large system prompts (~11.5K tokens)

    Tool definitions + context (~1.3K tokens)

    This makes each request relatively heavy, which:

    • Increases compute cost per request
    • Makes it more likely to be throttled under congestion

    Answers to your questions

    1. Can Provisioned Throughput (PT) be expedited?

    We understand the urgency. While we don’t have direct control to expedite approvals.

    2. Interim solution / priority bump for Global Standard?

    There is no manual priority bump available for Global Standard deployments. Capacity is shared and dynamically allocated.

    3. ETA for Provisioned Throughput?

    This depends on Region capacity availability, Internal approval queue

    4. Is there a documented single-request token threshold?

    There is no fixed public threshold.

    However Larger requests (like yours: ~13K–15K tokens) are more likely to be throttled during peak load, The system uses adaptive limits, not static ones

    5. Are specific regions congested?

    We don’t have a public list of impacted regions, but:

    • High-demand regions can experience intermittent capacity pressure
    • This can vary throughout the day

    Recommended mitigations

    While waiting for PT approval, here are practical steps to stabilize your workload:

    1. Reduce request size

    Trim system prompts where possible

    Move static instructions to shorter representations

    Minimize tool definitions if not required per request

    Even a 20–30% reduction can significantly reduce throttling probability

    1. Implement smarter retry strategy

    You already have exponential backoff consider adding:

    • Jitter (randomized delay)
    • Longer backoff window for 429s specifically
    1. Traffic smoothing
    • Avoid burst traffic patterns
    • Distribute requests more evenly over time
    1. Multi-region fallback
    • Deploy in an additional region
    • Route traffic when one region is throttled

    Thank you!

    0 comments No comments

  2. Tarandeep Singh Khurana 20 Reputation points
    2026-03-12T09:07:45.51+00:00

    Hi **SRILAKSHMI C
    Thanks for the quick response on the previous outage - that was resolved on our end as well.

    We're now facing a different issue - 429 rate limit errors on both GPT 5.2 and GPT 5.4 deployments.

    Error Responses:

    {

      "error": {

        "code": "429",

        "message": "The system is currently experiencing high demand and cannot process your request. Your request exceeds the maximum usage size allowed during peak load. For improved capacity reliability, consider switching to Provisioned Throughput."

      }

    }

    {

      "error": {

        "code": "429",

        "message": "The server had an error processing your request. Sorry about that! You can retry your request or contact us through an Azure support request at: https://go.microsoft.com/fwlink/?linkid=2213926 if you keep seeing this error. (Please include the request ID e6ddd5fd-1f59-41d1-8ed3-d0fbcf0c9d97 in your email.)."

      }

    }

    Our Analysis:

    • This is NOT a quota issue - our TPM limit (10M) is nowhere near exhausted
    • Our requests are ~13K-15K tokens each (system prompt ~11.5K + tool definitions ~1.3K + user context)
    • The error suggests Azure's global infrastructure is rejecting "large" single requests during peak demand periods
    • We have retry logic with exponential backoff, but the errors persist frequently

    Action Taken: We've already submitted the form for Provisioned Throughput access.

    What we need:

    1. Can you expedite the Provisioned Throughput approval/provisioning?
    2. In the meantime, is there any interim solution or priority bump for our  Global Standard deployment?
    3. Any ETA on when PT will be available for our resource?
    4. Is there a documented single-request token threshold during peak load?
    5. Are there specific regions/data centers experiencing sustained congestion?

    This is impacting our production workflows, so a quick turnaround would be appreciated.

    Thanks,

    Tarandeep Singh Khurana


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.