How do azure OpenAI rate limits work?

JAIN Saumya 20 Reputation points
2023-07-25T02:48:16.7766667+00:00

For Requests Per Minute, are the minutes calculated as 12:00 to 12:01 or is it a sliding window, for example, if I send a request at 12:02:10, then the minute will be calculated from 12:02:10 to 12:03:10?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,661 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 48,541 Reputation points Microsoft Employee
    2023-07-25T09:01:45.61+00:00

    @JAIN Saumya RPM rate limits are based on the number of requests received over time. Azure OpenAI evaluates the rate of incoming requests over a small period of time, typically 1 or 10 seconds and then determines if the rate limits are being exceeded. If it estimates that the rate could exceed error 429 is reported. See this section from documentation to get a better understanding of how this works.

    To summarize, the rate limits are estimated based on a small time period and is not the sum of actual requests received over a minute. This is true for all Azure cognitive services and error 429 is reported if the service sees the limit being breached. Follow the best practices to avoid this error and stay within the quota allocated.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.