rate_limit_exceeded: Rate limit is exceeded. Try again in 86400 seconds. RunId: run_b6zXnPtFas3Pj8Ixzj6IOv

MarcHornung-7035 131 Reputation points
2025-02-20T17:39:41.63+00:00

I know this has been asked in various locations, but I have not yet found a solution. In Azure AI Foundry I have a Pay-As-You-Go plan and have set up a hub and project and uploaded a test document for use in the playground, just as the tutorial shows. I can query "Hello" and get the agent to reply "Hello! How can I assist you today?" After that the entire system refuses to process any other queries. Some requests report rate limir exceeded wait 29 seconds, but more usually it says wait 86400 seconds. I request a quota increase and that has not helped. The top of the page shows that there are 3598 tokens, I guess available. The number changes but the rate limit is still the only reply.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,098 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 6,250 Reputation points Microsoft External Staff Moderator
    2025-02-21T01:21:17.4433333+00:00

    Hello MarcHornung-7035,

    Greetings and Welcome to Microsoft Q&A!

    I understand that you are experiencing rate limit issues with Azure OpenAI Service,

    Sometimes, when I try to reproduce the issue, I also encounter the same problem despite the rate limits being in place. The issue may depend on the specific model you have deployed. To resolve this, try deleting the model, refreshing the environment, and redeploying it. This process can help reset any underlying configuration issues and improve performance.

    Also follow these steps:

    Azure enforces strict limits on token consumption per minute, hour, and day, as well as the rate of API calls per second or minute.

    Additionally, new accounts may have hard limits imposed by Microsoft, restricting overall usage. To resolve this, navigate to Azure Portal → OpenAI Service → Usage & Quotas and review the rate limits for the deployed model, such as GPT-4 or GPT-3.5.

    If you have requested a quota increase, ensure that it has been approved and applied, as Azure does not always process increases instantly. Some accounts may also have a daily cap that prevents further usage even when quota appears available.

    As certain Azure regions enforce lower usage limits, particularly for new accounts. If your Azure OpenAI instance is in a restricted region, try deploying the model in another region like East US or West Europe, where limits may be higher. If you are using Azure AI Foundry, check whether Foundry itself imposes additional rate limits separate from standard Azure OpenAI restrictions.

    To monitor API usage and diagnose rate limit issues, enable metrics in the Azure Portal → Monitor → Metrics section. Review logs for Throttled Requests and Rate Limits Reached errors.

    You can also run an Azure CLI command to check your quota and identify potential bottlenecks. By following these steps, you can better understand your usage limitations and take corrective actions to optimize your Azure OpenAI deployment.

    Also, to minimize issues related to rate limits, it's a good idea to use the following techniques:

    • Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.
    • Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.
    • Implement retry logic in your application.
    • Avoid sharp changes in the workload. Increase the workload gradually.
    • Test different load increase patterns.

    Please refer this Azure OpenAI Service models, Azure OpenAI Service quotas and limits.

    Hope this helps. Do let me know if you have any further queries.

    Thank you!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.