Rate Limits in Azure OpenAI Service - how does it work?

Filip Dratwinski 110 Reputation points
2023-05-10T13:46:27.5266667+00:00

Hi,

the official docs for the Azure OpenAI service are mentioning that the Rate Limit for ChatGPT model is 300 Requests per minute - https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quotas-limits.

I was expecting that this rate limit is fixed and is calculated per minute. So that I am sending my request during one minute and if I have sent 300 requests in 30 seconds, then for the remaining 30 seconds of the minute I cannot send any more requests and I need to wait (this behaviour I saw in the original OpenAI API)

Today I learned that this is not the same in the Azure OpenAI service. If I am to send 30 request in parallel, then for some of them I am getting an error of the rate limit exceeded. Also, on the Metrics tab I can see that the Rate Limit is dynamic and oscillates around 300.

User's image

Could you explain how exactly the Rate Limit works on the Azure OpenAI service? Is it calculated by one second/5 seconds/10 seconds? How to make sure that I am using the Rate Limit efficiently in this situation? It would be nice also to explain it more in-depth in the documentation. Thanks in advance for the reply.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,599 questions
{count} votes