Rate Limits in Azure OpenAI Service - how does it work?

Question

Rate Limits in Azure OpenAI Service - how does it work?

Filip Dratwinski 110

Hi,

the official docs for the Azure OpenAI service are mentioning that the Rate Limit for ChatGPT model is 300 Requests per minute - https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quotas-limits.

I was expecting that this rate limit is fixed and is calculated per minute. So that I am sending my request during one minute and if I have sent 300 requests in 30 seconds, then for the remaining 30 seconds of the minute I cannot send any more requests and I need to wait (this behaviour I saw in the original OpenAI API)

Today I learned that this is not the same in the Azure OpenAI service. If I am to send 30 request in parallel, then for some of them I am getting an error of the rate limit exceeded. Also, on the Metrics tab I can see that the Rate Limit is dynamic and oscillates around 300.

User's image

Could you explain how exactly the Rate Limit works on the Azure OpenAI service? Is it calculated by one second/5 seconds/10 seconds? How to make sure that I am using the Rate Limit efficiently in this situation? It would be nice also to explain it more in-depth in the documentation. Thanks in advance for the reply.

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-05-10T23:22:35.8666667+00:00

Hi @Filip Dratwinski , Thanks for using Microsoft Q&A Platform.

Sorry for any confusions. Generally, it is suggested that 5 calls / second, or 50 calls every 10 seconds.

Please note that for a short time you still might see Error: 429 even if staying in this limit.

Thank you for your feedback regarding documentation update we have already shared this to the PG team.

I hope this helps.

Regards,
Vasavi
Filip Dratwinski 110 Reputation points

2023-05-11T05:19:14.2166667+00:00

Hi Vasavi,

why 15 calls every 10 seconds? If the limit is 300 per one minutes, doesn't it mean that I could be able to send 300/6=50 calls every ten seconds in this 10 seconds intervals?

And also, returning to my initial question about the dynamic rate limit - why on the metrics it is shown as the dynamic value? Is it changing constantly? If yes, then why? What effect does it have?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-05-11T22:48:14.8766667+00:00

@Filip Dratwinski , my apologies. Correcting previous conversation as well it's suggested that 5 calls / second, or 50 calls every 10 seconds.
Filip Dratwinski 110 Reputation points

2023-05-12T07:40:58.7733333+00:00

@VasaviLankipalle-MSFT Okay. There is still one question left from my side. Why on the metrics Tab, I can see that the Rate Limit is changing? Why is that?
Jack Liu 30 Reputation points

2023-06-04T04:09:48.83+00:00

I have same question in understanding the metric. Thanks.
David Collien 10 Reputation points

2023-06-22T06:19:16.9866667+00:00

I have the same question. What is the ratelimit metric?
JOSEPH Christian 0 Reputation points

2024-03-11T15:25:54.9233333+00:00

@Filip Dratwinski 300 request per minute, whats the mining of "K" ? I though it was 300 000 request/min
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Your answer

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-05-10T23:22:35.8666667+00:00

Hi @Filip Dratwinski , Thanks for using Microsoft Q&A Platform.

Sorry for any confusions. Generally, it is suggested that 5 calls / second, or 50 calls every 10 seconds.

Please note that for a short time you still might see Error: 429 even if staying in this limit.

Thank you for your feedback regarding documentation update we have already shared this to the PG team.

I hope this helps.

Regards,
Vasavi
Filip Dratwinski 110 Reputation points

2023-05-11T05:19:14.2166667+00:00

Hi Vasavi,

why 15 calls every 10 seconds? If the limit is 300 per one minutes, doesn't it mean that I could be able to send 300/6=50 calls every ten seconds in this 10 seconds intervals?

And also, returning to my initial question about the dynamic rate limit - why on the metrics it is shown as the dynamic value? Is it changing constantly? If yes, then why? What effect does it have?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-05-11T22:48:14.8766667+00:00

@Filip Dratwinski , my apologies. Correcting previous conversation as well it's suggested that 5 calls / second, or 50 calls every 10 seconds.
Filip Dratwinski 110 Reputation points

2023-05-12T07:40:58.7733333+00:00

@VasaviLankipalle-MSFT Okay. There is still one question left from my side. Why on the metrics Tab, I can see that the Rate Limit is changing? Why is that?
Jack Liu 30 Reputation points

2023-06-04T04:09:48.83+00:00

I have same question in understanding the metric. Thanks.
David Collien 10 Reputation points

2023-06-22T06:19:16.9866667+00:00

I have the same question. What is the ratelimit metric?
JOSEPH Christian 0 Reputation points

2024-03-11T15:25:54.9233333+00:00

@Filip Dratwinski 300 request per minute, whats the mining of "K" ? I though it was 300 000 request/min
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Share via

Rate Limits in Azure OpenAI Service - how does it work?

Your answer