Share via

GPT-5.2 - Can anyone from Microsoft and working in Microsoft Foundry confirm that there are issues deploying gpt-5.2 ?

GS 405 Reputation points
2025-12-15T03:12:25.05+00:00

Can anyone from Microsoft and working in Microsoft Foundry confirm that there are issues with gpt-5.2 deployed in Azure ?

It works intermittently and when it does not, we get a 429 error too many requests even for a single user.

TPM is set to 1.47M .. I doubt that its a rate limit issueUser's image

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform

0 comments No comments

7 answers

Sort by: Most helpful
  1. Michael Streif 25 Reputation points
    2025-12-17T09:38:20.22+00:00

    Hello,

    we are also experiencing the same problem as the others. Even if we increase the quota to 10 million TPM i get the 429 Error code in the development environment. This is really a big problem because we cannot use the new GPT-5.2 model like this in production. We also waited for over 2 days now based on the comment of @Hanna Holasava but we still experience the same issue.

    What can we do now?

    1 person found this answer helpful.
    0 comments No comments

  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

  3. GS 405 Reputation points
    2025-12-16T02:21:37.1966667+00:00

    @Cynthia Luijkx i am testing right now and 5.2 seems to be a little snappier and no capacity issues / rate limiting message


  4. Anshika Varshney 9,900 Reputation points Microsoft External Staff Moderator
    2025-12-15T12:46:37.1033333+00:00

    Hey Siry, Gaetan,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    It sounds like you're having some trouble with deploying the GPT-5.2 model in Azure, particularly experiencing intermittent 429 errors which indicate "too many requests." Let’s break down some potential reasons and solutions.

    1. Rate Limiting: Given that you mentioned your TPM (Tokens Per Minute) is set to 1.47M and still encountering a 429 error, it's essential to confirm if this limit is being exceeded. The GPT-5.2 model has a default TPM of 1,000,000. You might want to monitor your deployment’s rate limit usage through Azure Monitor.
    2. Traffic Patterns: Check if there are peak usage times that correlate with these errors. Sometimes, high traffic load in your region can lead to rate limiting. If possible, try deploying to a different region or distributing the load across multiple regions.
    3. Fallback Mechanisms: Implementing fallback mechanisms can help manage transient faults. Azure provides a built-in Transient Fault Handling framework that can automatically retry failed requests, which may alleviate some of the intermittent issues.
    4. Load Balancing: If you're seeing spikes in usage, load balancing across various instances could help manage the traffic more effectively. Make sure your deployment is set up to distribute requests evenly.
    5. Monitoring and Alerts: Utilize Azure Monitor to keep an eye on performance metrics. This will help you track usage and identify any unusual activity or error spikes.
    6. Software Updates: Ensure that your deployment is running the latest version of the GPT-5 model, as improvements and bug fixes can help enhance stability and performance.

    Relevant Documentation:

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!


  5. Gowtham CP 7,955 Reputation points Volunteer Moderator
    2025-12-15T04:45:57.8233333+00:00

    Hi Siry, Gaetan,

    Thanks for the question.

    1. Why 429 can happen even with high TPM The TPM slider in Azure Foundry is not the only limiter. Azure OpenAI also enforces RPM and short-interval burst limits. Token usage is estimated at request start (prompt + max tokens), so a request can be throttled even when the portal still shows available quota. Reference: https://learn.microsoft.com/azure/ai-foundry/openai/how-to/quota

    2. Why this can affect a single user Rate limits are evaluated over smaller time windows, not just per minute. A single user or a few closely spaced requests can still trigger 429 responses. This is expected behavior. Reference: https://learn.microsoft.com/azure/ai-foundry/openai/quotas-limits

    3. Intermittent behavior Azure documentation also notes that 429s can occur during backend capacity constraints or high system demand, independent of your configured quota. This explains the intermittent success you’re seeing with GPT-5.2. Reference: https://learn.microsoft.com/azure/ai-foundry/openai/reference#http-status-codes

    I hope this helps. If this answers your question, please accept and upvote to close the thread.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.