Rate Limited Exceeded Azure OpenAI Standard

Barry Briggs 110 Reputation points
2024-12-04T21:44:07.3933333+00:00

Connected Azure OpenAI to small nonvectorized data set in Azure AI Search. Responses in Azure OpenAI Chat Playground are set to be limited to the dataset. (GPT-4o, Standard S0)

When I ask one of the sample questions (in the Chat UI) which I know to not be in the dataset it returns "The requested information is not available in the retrieved data. Please try another query or topic" -- which is correct.

When I ask a question I know to be in the Azure AI Search index, it consistently returns "Server responded with status 429. Error message: {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 59 seconds.'}}" -- no matter how long I wait.

I've only typed in a half-dozen or so very short (6 words-ish) prompts and updated the system prompt (~35-40 words). I've waited much longer than 59 seconds between prompts.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,092 questions
{count} votes

Accepted answer
  1. Max Lacy 345 Reputation points
    2024-12-05T13:34:19.76+00:00

    I understand you are experiencing a rate limit issue when trying to utilize the Azure OpenAI Chat Playground when utilizing the connect your data feature and limiting the responses to only those derived from the dataset.

    When a deployment is created, the assigned TPM will directly map to the tokens-per-minute rate limit enforced on its inferencing requests. A Requests-Per-Minute (RPM) rate limit will also be enforced whose value is set proportionally to the TPM assignment using the following ratio:

    6 RPM per 1000 TPM.

    Connecting to your data increases the calls per minute. The flow of API call is then Assitant API -> AI Search -> Assitants API. If you're returning a large data set that can also trigger the rate limit.

    To solve your problem look at increasing your Token per minute in the Azure AI Portal. This will increase the allowed RPM to ensure you hit less rate limits.

    Screenshot of the deployment UI of Azure AI Foundry

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.