"exceeded token rate limit" for first (short) prompt - gpt 4o

Maxon Rubin-Toles 20 Reputation points

When I attempt to make a call to gpt-4o (either in "Playground" or via a Python script), I get the following error:

Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-04-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit. | Apim-request-id: f825bef5-9148-422e-9417-374bca044b39

This happens even for few-token prompts (e.g., "Hello") after hours of not using OpenAI services at all. Alhough I am on the S0 pricing tier, my capacity is listed as 1,000 tokens-per-minute and 6 requests-per-minute.

For some reason, I am able to sometimes make calls to GPT-3.5 turbo, although I receive the same error after attempting even a second prompt in succession (although I am certainly not exceeding capacity, which is listed as the same for 3.5 turbo).

Why am I unable to make calls at the designated rates? I have also checked, and both models are available in my region ("eastus"). Thank you for your help.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,574 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Tim Young 5 Reputation points

    I had the same problem and tried switching to an older model (gpt-4-32k rather than gpt-4o) and that seemed to do the trick. I have a feeling that perhaps the newer / higher performance models are not actually generally available? (I had the same problem trying to get a response from gpt-4-turbo.)

    I am also on the S0 subscription, still using the initial free credit, so that might have something to do with it too.

    1 person found this answer helpful.