AI studio with connected vector store gives "rate_limit_exceeded: Rate limit is exceeded." every time.

Lukáš Sitta 20 Reputation points
2024-10-25T07:54:51.7533333+00:00

I am trying to create a custom bot in Azure AI studio that will work with attached file/files.

Currently I am testing with just a single file stored in a vector store.

Everytime I ask the bot a question about the information stated in the file, I get

rate_limit_exceeded: Rate limit is exceeded. Try again in 30 seconds. RunId: run_0HK5UP3zdrnH0hWszGkCcb53:

All my resources are in Norway West. Microsoft support extended my quote for gpt-4o in this region to 300. Yet the problem still occurs.

Thank you.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,049 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,893 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AshokPeddakotla-MSFT 34,611 Reputation points
    2024-10-25T10:11:53.7533333+00:00

    Lukáš Sitta Greetings!

    rate_limit_exceeded: Rate limit is exceeded. Try again in 30 seconds. RunId: run_0HK5UP3zdrnH0hWszGkCcb53:

    The error message is related to rate limits, which is a common practice in APIs to prevent abuse and ensure fair usage.

    In your case, the error message indicates that you’ve exceeded the token rate limit of your current AI Services S0 pricing tier.

    Did you check if you have exceeded the quota limit for your Azure OpenAI resources? You can view your quotas and limits in Azure AI studio Model Quota section.

    User's image

    Please see Manage and increase quotas for resources with Azure AI Studio for more details.

    You could also try increasing the limit on your deployment.User's image

    Also, see Autoscale Azure AI limits and let me know if that helps in your scenario.

    To minimize issues related to rate limits, it's a good idea to use the following techniques:

    • Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.
    • Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.
    • Implement retry logic in your application.
    • Avoid sharp changes in the workload. Increase the workload gradually.
    • Test different load increase patterns.

    Incase if you have already extended the limits and still seeing the issue, please contact support for further assistance.

    Hope this helps. Do let me know if you have any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.