"exceeded token rate limit" for first (short) prompt - gpt 4o

Question

"exceeded token rate limit" for first (short) prompt - gpt 4o

Maxon Rubin-Toles 30

When I attempt to make a call to gpt-4o (either in "Playground" or via a Python script), I get the following error:

Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-04-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit. | Apim-request-id: f825bef5-9148-422e-9417-374bca044b39

This happens even for few-token prompts (e.g., "Hello") after hours of not using OpenAI services at all. Alhough I am on the S0 pricing tier, my capacity is listed as 1,000 tokens-per-minute and 6 requests-per-minute.

For some reason, I am able to sometimes make calls to GPT-3.5 turbo, although I receive the same error after attempting even a second prompt in succession (although I am certainly not exceeding capacity, which is listed as the same for 3.5 turbo).

Why am I unable to make calls at the designated rates? I have also checked, and both models are available in my region ("eastus"). Thank you for your help.

Stawsh Murawski 5 Reputation points

2024-06-24T19:15:20.98+00:00

I have essentially the same issue. I am using the Azure OpenAI Assistant, and on my first Run on a thread I get "Rate limit is exceeded. Try again in N seconds." This happens on my first call of the day. This happens both in the AI Sudio Playground and from my own Python code. This is gpt-4o in East-US.

2 answers

Your answer

Stawsh Murawski 5 Reputation points

2024-06-24T19:15:20.98+00:00

I have essentially the same issue. I am using the Azure OpenAI Assistant, and on my first Run on a thread I get "Rate limit is exceeded. Try again in N seconds." This happens on my first call of the day. This happens both in the AI Sudio Playground and from my own Python code. This is gpt-4o in East-US.

Answer 1

Tim Young 5

I had the same problem and tried switching to an older model (gpt-4-32k rather than gpt-4o) and that seemed to do the trick. I have a feeling that perhaps the newer / higher performance models are not actually generally available? (I had the same problem trying to get a response from gpt-4-turbo.)

I am also on the S0 subscription, still using the initial free credit, so that might have something to do with it too.

Stawsh Murawski 5 Reputation points

2024-06-26T21:31:46.3033333+00:00

I requested a quota increase, which was granted, up to 100 quota (100000 tokens per min.) and 6o0 requests per minute. But no change to my "failure".

Answer 2

Yu Qi, Aaron 5

somehow I could workaround it by adjusting the Prompty file max_tokens parameter.

max_tokens: 256

The estimated token is somehow added to this parameter. In my testing, if I set this to 1024, the estimated token is 1024+ xx (xx is the actual token from my input).

After adjusting it to lower number, e.g 256, the API call went through successfully.

Share via

"exceeded token rate limit" for first (short) prompt - gpt 4o

2 answers

Your answer