How do azure OpenAI rate limits work?

Question

How do azure OpenAI rate limits work?

JAIN Saumya 20

For Requests Per Minute, are the minutes calculated as 12:00 to 12:01 or is it a sliding window, for example, if I send a request at 12:02:10, then the minute will be calculated from 12:02:10 to 12:03:10?

Accepted answer

0 additional answers

Your answer

Answer 1

romungi-MSFT 48,911 Microsoft Employee Moderator

@JAIN Saumya RPM rate limits are based on the number of requests received over time. Azure OpenAI evaluates the rate of incoming requests over a small period of time, typically 1 or 10 seconds and then determines if the rate limits are being exceeded. If it estimates that the rate could exceed error 429 is reported. See this section from documentation to get a better understanding of how this works.

To summarize, the rate limits are estimated based on a small time period and is not the sum of actual requests received over a minute. This is true for all Azure cognitive services and error 429 is reported if the service sees the limit being breached. Follow the best practices to avoid this error and stay within the quota allocated.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

JAIN Saumya 20 Reputation points

2023-07-26T03:50:46.51+00:00

I understand how the limits work now. However, my organisation's account only mentions the TPM limits, which are 120K tokens per minute. Would it work the same way for tokens or are there RPM limits that I am not aware of?
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-26T05:14:22.17+00:00

While creating a deployment a Requests-Per-Minute (RPM) rate limit will also be enforced whose value is set proportionally to the TPM assignment using the following ratio:

6 RPM per 1000 TPM.

You should be able to see the RPM value while creating the deployment from studio.
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-28T05:06:58.3133333+00:00

@JAIN Saumya Did the above response help answer your query? Thanks!!
JAIN Saumya 20 Reputation points

2023-07-31T06:16:41.8+00:00

As for the the text-davinci-003 model, it says the max request tokens is 4097, where does this limit factor in?
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-31T14:03:13.8766667+00:00

This limit refers to max input token limit of a model for a request.

Tokens-Per-Minute (TPM) allocation is not related to the max input token limit of a model. Model input token limits are defined in the models table and are not impacted by changes made to Tokens per minute (TPM).

Share via

How do azure OpenAI rate limits work?

0 additional answers

Your answer