429 error while running model in AI foundry

Question

429 error while running model in AI foundry

Shubham Jalewa 0

Hello team,

Yesterday while running DeepSeek via Azure AI foundry we received a 429 error . We went through the documents where it said that this error occurs when there are too many requests. Can you help me with the same how to fix it.

Regards

Shubham

2 answers

Your answer

Answer 1

Hi Shubham Jalewa,

Azure OpenAI's quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota”. Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). Your subscription is onboarded with a default quota for most models.

Refer to this document for default TPM values. You can allocate TPM among deployments until reaching quota. If you exceed a model's TPM limit in a region, you can reassign quota among deployments or request a quota increase. Alternatively, if viable, consider creating a deployment in a new Azure region in the same geography as the existing one.

TPM rate limits are based on the maximum tokens estimated to be processed when the request is received. It is different than the token count used for billing, which is computed after all processing is completed. Azure OpenAI calculates a max processed-token count per request using:

· Prompt text and count

· The max_tokens setting

· The best_of setting

This estimated count is added to a running token count of all requests, which resets every minute. A 429 response code is returned once the TPM rate limit is reached within the minute.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

1. Implement retry logic in your application.

2. Avoid sharp changes in the workload. Increase the workload gradually.

3. Test different load increase patterns.

4. Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

Remember to optimize these settings based on your specific needs.

Resources:

· Optimizing Azure OpenAI: A Guide to Limits, Quotas, and Best Practices

· Azure OpenAI Service quotas and limits

· Azure OpenAI Insights: Monitoring AI with Confidence

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Thank you!

Shubham Jalewa 0 Reputation points

2025-06-13T14:13:40.1866667+00:00

I am not using OpenAI models.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-06-16T06:18:10.28+00:00

Hi Shubham Jalewa,

Here are some steps:

Implement a delay between requests to avoid hitting the rate limit.

Ensure that you are not exceeding the quota for your service. If you are, you may need to delete unused indexes or upgrade your service for higher limits.

If possible, break down your input documents into smaller batches to reduce the load on the service.

Kindly refer below link: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data?tabs=ai-search%2Ccopilot#troubleshooting

Thank you.
Prashanth Veeragoni 4,930 Reputation points Microsoft External Staff Moderator

2025-06-17T07:23:16.9633333+00:00

Hello Shubham Jalewa,

Checking back to see on the previous responses provided by Saideep and Danny helped you, let us know if you have any further queries.

Thank you!

Answer 2

Hi Shubham,

Thank you for contacting Q&A Forum. The 429 error you're encountering typically occurs when the number of requests exceeds the allowed quota. To resolve this issue, you need to increase the quota for your DeepSeek model via a support request.

Steps to Fix:

Understand the Error: The 429 error indicates that there are too many requests being made. You can find more details in the Azure documentation here.
Request Quota Increase: To increase the quota, follow the instructions provided in the below Azure official documentation.
Submit a Support Request: Navigate to the Azure portal and submit a support request to increase the quota for your DeepSeek model. Ensure you provide all necessary details about your current usage and the required increase.
Implement Retry Logic: While waiting for the quota increase, implement retry logic in your application to handle 429 errors gracefully. This can help manage the load and reduce the frequency of errors.

References:

If I have answered your question, please accept this answer as a token of appreciation and don't forget to give a thumbs up for "Was it helpful"!

Best Regards,

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Share via

429 error while running model in AI foundry

2 answers

Your answer