OpenAI request hit 429 when rate limit is not reached

Question

OpenAI request hit 429 when rate limit is not reached

Yuyi Zhou 0

Hi,

I deployed a GPT 4o model. The rate limits are

Rate limit (Tokens per minute)50,000

Rate limit (Requests per minute)500

500 request per minute is around 8 request per second.

I got 429 too many requests for the 7th request in 3 seconds. Each request is around 200 tokens

Why the 429 too many requests are returned when the rate limit is not hit?

[20:30:00] [INFO] [dku.utils] - 2025-06-12 20:30:00,660 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 200 OK"

[20:30:01] [INFO] [dku.utils] - 2025-06-12 20:30:01,067 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 200 OK"

[20:30:01] [INFO] [dku.utils] - 2025-06-12 20:30:01,418 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 200 OK"

[20:30:01] [INFO] [dku.utils] - 2025-06-12 20:30:01,861 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 200 OK"

[20:30:02] [INFO] [dku.utils] - 2025-06-12 20:30:02,295 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 200 OK"

[20:30:02] [INFO] [dku.utils] - 2025-06-12 20:30:02,816 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 200 OK"

[20:30:02] [INFO] [dku.utils] - 2025-06-12 20:30:02,884 INFO HTTP Request: POST https://azure-ai-canada-east.openai.azure.com/openai/deployments/gpt-4o-canada-east/chat/completions?api-version=2024-12-01-preview "HTTP/1.1 429 Too Many Requests"

1 answer

Your answer

Answer 1

Hi Yuyi Zhou,

Azure OpenAI's quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota”. Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). Your subscription is onboarded with a default quota for most models.

Refer to this document for default TPM values. You can allocate TPM among deployments until reaching quota. If you exceed a model's TPM limit in a region, you can reassign quota among deployments or request a quota increase. Alternatively, if viable, consider creating a deployment in a new Azure region in the same geography as the existing one.

TPM rate limits are based on the maximum tokens estimated to be processed when the request is received. It is different than the token count used for billing, which is computed after all processing is completed. Azure OpenAI calculates a max processed-token count per request using:

· Prompt text and count

· The max_tokens setting

· The best_of setting

This estimated count is added to a running token count of all requests, which resets every minute. A 429 response code is returned once the TPM rate limit is reached within the minute.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

1. Implement retry logic in your application.

2. Avoid sharp changes in the workload. Increase the workload gradually.

3. Test different load increase patterns.

4. Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

Remember to optimize these settings based on your specific needs.

Resources:

· Optimizing Azure OpenAI: A Guide to Limits, Quotas, and Best Practices

· Azure OpenAI Service quotas and limits

· Azure OpenAI Insights: Monitoring AI with Confidence

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Thank you!

Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-06-16T05:21:19.98+00:00

Hi Yuyi Zhou,

Did You get any chance to check above response.

Thank You.
Prashanth Veeragoni 5,090 Reputation points Microsoft External Staff Moderator

2025-06-17T07:26:48.3033333+00:00

Hello Yuyi Zhou,

We didn't hear from you on the last response, if my solution helped you in resolving your issue, please do up-vote and accept it or do let me know if you have any further queries.

Thank you!

Share via

OpenAI request hit 429 when rate limit is not reached

1 answer

Your answer