GPT-5.2 - Can anyone from Microsoft and working in Microsoft Foundry confirm that there are issues deploying gpt-5.2 ?

Question

GPT-5.2 - Can anyone from Microsoft and working in Microsoft Foundry confirm that there are issues deploying gpt-5.2 ?

GS 405

Can anyone from Microsoft and working in Microsoft Foundry confirm that there are issues with gpt-5.2 deployed in Azure ?

It works intermittently and when it does not, we get a 429 error too many requests even for a single user.

TPM is set to 1.47M .. I doubt that its a rate limit issue User's image

0 comments

7 answers

Your answer

Answer 1

Michael Streif 25

Hello,

we are also experiencing the same problem as the others. Even if we increase the quota to 10 million TPM i get the 429 Error code in the development environment. This is really a big problem because we cannot use the new GPT-5.2 model like this in production. We also waited for over 2 days now based on the comment of @Hanna Holasava but we still experience the same issue.

What can we do now?

0 comments

Answer 2

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Answer 3

GS 405

@Cynthia Luijkx i am testing right now and 5.2 seems to be a little snappier and no capacity issues / rate limiting message

Anshika Varshney 9,900 Reputation points Microsoft External Staff Moderator

2025-12-16T06:27:21.17+00:00

Hi Siry, Gaetan,
Thanks for sharing your update! Glad to hear GPT‑5.2 is feeling snappier on your side and you’re not seeing any capacity issues or rate‑limiting messages. Appreciate you testing and confirming this helps others in the thread as well.
Thankyou!
GS 405 Reputation points

2025-12-16T11:44:29.72+00:00

well it was working last night but this morning I am seeing a lot of latency again to the point that I have to pull this out of production again .. very frustrating
Anshika Varshney 9,900 Reputation points Microsoft External Staff Moderator

2025-12-16T17:37:58.67+00:00

Hi **Siry, Gaetan,
**That’s understandably frustrating. What you’re describing service-side performance fluctuation, not something you changed in your application especially since it was working normally last night and then regressed.

I’ll update the thread as soon as the product team shares more details or a timeline.
Thankyou!
GS 405 Reputation points

2025-12-16T17:44:18.3633333+00:00

thank you very much Anshika!
Anshika Varshney 9,900 Reputation points Microsoft External Staff Moderator

2025-12-23T10:18:31.31+00:00

Hi GS,

Just checking back to see if you’re still facing the same issue. If the problem persists, please share a few more details and we’ll be happy to help you further.

Thankyou!
GS 405 Reputation points

2025-12-23T11:53:23.15+00:00

this is working now - thank you for checking.
Cynthia Luijkx 50 Reputation points

2025-12-23T11:55:47.5833333+00:00

@Anshika Varshney Just this morning I checked and on a simple prompt to GPT-5.2 I still got a response time of 45 seconds. Therefore it seems to me that while the 429s might not occur, GPT-5.2 is not in the shape it needs to be yet. For comparison, the same prompt takes 1-3 seconds on GPT-4.1 and 5.1.
GS 405 Reputation points

2025-12-23T12:05:44.0266667+00:00

@Cynthia Luijkx I agree .. it's definitely on the slower side in Azure compared to other models like 5.1

Answer 4

Hey Siry, Gaetan,

Welcome to Microsoft Q&A and Thank you for reaching out.

It sounds like you're having some trouble with deploying the GPT-5.2 model in Azure, particularly experiencing intermittent 429 errors which indicate "too many requests." Let’s break down some potential reasons and solutions.

Rate Limiting: Given that you mentioned your TPM (Tokens Per Minute) is set to 1.47M and still encountering a 429 error, it's essential to confirm if this limit is being exceeded. The GPT-5.2 model has a default TPM of 1,000,000. You might want to monitor your deployment’s rate limit usage through Azure Monitor.
Traffic Patterns: Check if there are peak usage times that correlate with these errors. Sometimes, high traffic load in your region can lead to rate limiting. If possible, try deploying to a different region or distributing the load across multiple regions.
Fallback Mechanisms: Implementing fallback mechanisms can help manage transient faults. Azure provides a built-in Transient Fault Handling framework that can automatically retry failed requests, which may alleviate some of the intermittent issues.
Load Balancing: If you're seeing spikes in usage, load balancing across various instances could help manage the traffic more effectively. Make sure your deployment is set up to distribute requests evenly.
Monitoring and Alerts: Utilize Azure Monitor to keep an eye on performance metrics. This will help you track usage and identify any unusual activity or error spikes.
Software Updates: Ensure that your deployment is running the latest version of the GPT-5 model, as improvements and bug fixes can help enhance stability and performance.

Relevant Documentation:

I Hope this helps. Do let me know if you have any further queries.

If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thank you!

Cynthia Luijkx 50 Reputation points

2025-12-15T13:30:04.26+00:00

I did not post this question but I have been having the same issue. Ever since Friday it has not been possible to consistently send requests to GPT-5.2 on Standard deployments even if they are very small (~10k tokens). Responses are either very slow or result in 429s with inner stack traces about the kv_cache being full being exposed.

This seems to have very little to do with misconfiguration. Sure, peak load can result in rate limits but I would not consider the rate limiting that is currently happening to be even remotely close to normal usage.

Would very much appreciate information from Microsoft on this. It is incredibly frustrating.
GS 405 Reputation points

2025-12-15T14:57:12.48+00:00

I am not the only one having issue - You should acknowledge the issue on your side - There is nothing wrong with the way I have set this up.

Please see others having issues

https://learn.microsoft.com/en-us/answers/questions/5658592/deployed-gpt-5-2-to-foundry-running-into-rate-limi

https://www.reddit.com/r/AZURE/comments/1pkqt3m/removed_by_moderator/
GS 405 Reputation points

2025-12-15T23:05:29.6066667+00:00

@Cynthia Luijkx thank you for posting. I will keep trying on my end ... right now, I do not trust moving this model to production.
Anshika Varshney 9,900 Reputation points Microsoft External Staff Moderator

2025-12-23T12:07:38.8166667+00:00

Hi GS,

Thank you for sharing the update I appreciate you taking the time to confirm the resolution!

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Answer 5

Hi Siry, Gaetan,

Thanks for the question.

1. Why 429 can happen even with high TPM The TPM slider in Azure Foundry is not the only limiter. Azure OpenAI also enforces RPM and short-interval burst limits. Token usage is estimated at request start (prompt + max tokens), so a request can be throttled even when the portal still shows available quota. Reference: https://learn.microsoft.com/azure/ai-foundry/openai/how-to/quota

2. Why this can affect a single user Rate limits are evaluated over smaller time windows, not just per minute. A single user or a few closely spaced requests can still trigger 429 responses. This is expected behavior. Reference: https://learn.microsoft.com/azure/ai-foundry/openai/quotas-limits

3. Intermittent behavior Azure documentation also notes that 429s can occur during backend capacity constraints or high system demand, independent of your configured quota. This explains the intermittent success you’re seeing with GPT-5.2. Reference: https://learn.microsoft.com/azure/ai-foundry/openai/reference#http-status-codes

I hope this helps. If this answers your question, please accept and upvote to close the thread.

Share via

GPT-5.2 - Can anyone from Microsoft and working in Microsoft Foundry confirm that there are issues deploying gpt-5.2 ?

7 answers

Your answer