Why do I get a 429 saying I should retry in 24h in the OpenAI S0 pricing tier?

Question

Why do I get a 429 saying I should retry in 24h in the OpenAI S0 pricing tier?

Sorin Costea 5

All documents talk about quotas per MINUTE, yet the error I get says "Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-10-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds."

That is, NEXT DAY. However there's NO DOCUMENT mentioning any DAILY limit, and all quotas are per model anyway so per MINUTE, both in documentation and in the quotas tab in AI Studio.

So I don't know what to do about that error.

YutongTie-MSFT 53,981 Reputation points Moderator

2024-12-07T01:15:06.8266667+00:00

Hello @Sorin Costea

Thanks for reaching out to us, I checked with internal, it seems this issue is a known UI issue which UI setting the default maximum token parameter to the model’s maximum context length. There should be a fix is releasing, I will share more details when I get more information.

At the meantime, could you please share the error message with the Apim-request-id and more details?

I will let you know.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

1 answer

Your answer

YutongTie-MSFT 53,981 Reputation points Moderator

2024-12-07T01:15:06.8266667+00:00

Hello @Sorin Costea

Thanks for reaching out to us, I checked with internal, it seems this issue is a known UI issue which UI setting the default maximum token parameter to the model’s maximum context length. There should be a fix is releasing, I will share more details when I get more information.

At the meantime, could you please share the error message with the Apim-request-id and more details?

I will let you know.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Answer 1

Max Lacy 345

I understand you are experiencing a rate limit issue when trying to utilize ChatCompletions_Create Operation under Azure OpenAI API version 2024-10-01-preview.

When a deployment is created, the assigned TPM will directly map to the tokens-per-minute rate limit enforced on its inferencing requests. A Requests-Per-Minute (RPM) rate limit will also be enforced whose value is set proportionally to the TPM assignment using the following ratio:

6 RPM per 1000 TPM.
Depending on the configuration of your deployment your TPM may be set too low. To address your problem look at increasing your Token per minute in the Azure AI Portal. This will increase the allowed RPM to ensure you hit less rate limits located in Deployments | <select deployment> | Edit.

User's image

Sorin Costea 5 Reputation points

2024-12-06T18:41:30.5066667+00:00

No use to copy and paste the text talking about quota MINUTES, when the error message is about quota DAYS.
Max Lacy 345 Reputation points

2024-12-08T19:39:18.08+00:00

Thank you for calling this out. I went back and did some additional testing. It seems the limit of the retry message caps out at 86400 seconds.
I lowered my TPM to 1K tokens per minute. I then sent a 1,000 token message to the assistant and got the 86400 second error. After ~ 1 minute I was able to send another small message. I then raised my TPM to 150K TPM and sent the same 1,000 token message. This time the assistant provided a response.

In your case it seems you are hitting either the Request Per Minute or the Token Per Limit at an amount that triggers the retry message to be the maximum amount.

Below is how each is calculated:

*TPM - As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets.

RPM - rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute. To implement this behavior, Azure OpenAI Service evaluates the rate of incoming requests over a small period of time, typically 1 or 10 seconds. If the number of requests received during that time exceeds what would be expected at the set RPM limit, then new requests will receive a 429 response code until the next evaluation period.*

To fix you're issue I would recommend identifying the demand for both RPM and TPM and setting the TPM accordingly via the deployment edit button in Azure OpenAI studio.

Let me know if this helps and I will update my answer above to reflect the changes.
Sorin Costea 5 Reputation points

2024-12-08T22:51:08.4966667+00:00

Thank you for the analysis. However a minute is 60 seconds, not 86400, so no "per minute" limit can explain the message.
Max Lacy 345 Reputation points

2024-12-09T17:02:49.8933333+00:00

Apologies, I have not done a good job of directly answering your question.

Your question - "*Why do I get a 429 saying I should retry in 24h in the OpenAI S0 pricing tier?"
*
My suggested answer - You're triggering a RPM or TPM rate limit at rate higher than the maximum causing the maximum suggested retry time of 1 day (86400 seconds).
Sorin Costea 5 Reputation points

2024-12-11T23:36:43.38+00:00

So there is also an undocumented per-day limit?
Jeffrey Mak 0 Reputation points

2025-01-15T15:13:52.32+00:00

We hit on exactly the same issue. We know that the TPM needs to be properly set. My concern is the same as Sorin. If outflowing the request limit takes "1 day" to resume, it CANNOT be used in production.

Can we have a clarification:
(1) Does the "retry after 86400 seconds" message a typo or it actually means no request will be accepted until after 86400 seconds? Yes / No?

Share via

Why do I get a 429 saying I should retry in 24h in the OpenAI S0 pricing tier?

1 answer

Your answer