rate limit exceeded when trying to use assisstant

Chris Docker 0 Reputation points
2025-01-23T16:38:34.0133333+00:00

When trying to use the assistant in the Azure AI foundry, it works for the first prompt but the second prompt immediately throws a rate limit exceeded error

User's image

From what I can see in the deployment setup, it should allow 48 responses per minute but at the moment it only allows the one. Not sure where this is going wrong at the moment

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,633 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 2,490 Reputation points Microsoft Vendor
    2025-01-23T19:38:29.9333333+00:00

    Hello Chris Docker,

    Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.

    The "rate_limit_exceeded" error suggests that you've exceeded the allowed request threshold.

    I attempted to reproduce the issue in my environment, and it is working correctly for me. Here is the screenshot,User's image

    Ensure your subscription's quota supports the usage level, and consider adjusting rate limits or scaling your deployment.

    Please refer Manage and increase quotas for resources with Azure AI Studio for more details.

    Check for unintended calls that might be hitting the endpoint and review logs for patterns.

    Please see Manage Azure OpenAI Service quota for more details.

    You could also try increasing the limit on your deployment.User's image

    Also, to minimize issues related to rate limits, it's a good idea to use the following techniques:

    • Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.
    • Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.
    • Implement retry logic in your application.
    • Avoid sharp changes in the workload. Increase the workload gradually.
    • Test different load increase patterns.

    Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.