Azure OpenAI Error 429 - Request Below Rate Limit

Question

Azure OpenAI Error 429 - Request Below Rate Limit

Pedro Daniel Scheeffer Pinheiro 65

I am receiving an Error 429 while using Azure OpenAI, despite the request being below the rate limit.

Region: Sweden Central

The error message reads: "Error code: 429 - {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 86400 seconds.'}}".

My input is two prompts with around 900 tokens, and the max token limit is set to 4000. PTU utilization is at 0%. The error started occurring recently. Can someone help me troubleshoot this issue?
Also, is it necessary that I wait for a day to try again?

I tried to increase the timer between request but no luck.

rishita 35 Reputation points

2024-06-16T06:25:13.23+00:00

I am also facing same issue even if i am running it with Azure Openai studio chat playground.
Bas Hulskamp 40 Reputation points

2024-06-17T06:59:51.0366667+00:00

Same problem here, same region as well. I have also waited the 86400 seconds and the same error still occurs.
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-06-17T16:48:40.99+00:00
Pedro Daniel Scheeffer Pinheiro

I understand that you have limit and still encountering the issue.

To give more context, As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:

Prompt text and count

The max_tokens parameter setting

The best_of parameter setting

As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets. For more details, see Understanding rate limits.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.

Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.

Implement retry logic in your application.

Avoid sharp changes in the workload. Increase the workload gradually.

Test different load increase patterns.

Also, see A Guide to Limits, Quotas, and Best Practices for more details.

Hope this helps. Do let me know if you have any further queries.
Bas Hulskamp 40 Reputation points

2024-06-18T07:19:54.0033333+00:00
Good morning Ashok,

I have the same issue as OP, but as far as I know I only use 1 call to the API. I use API version 2024-05-01-preview and I want to create embeddings via a deployed model.

I use the LangChain library (langchain) and Chroma (langchain-chroma) in Python. This is how I create the embeddings:

self.embedding_function = AzureOpenAIEmbeddings(model="text-embedding-ada-002", api_key=az_creds["key"], azure_endpoint=az_creds["ep"], azure_deployment=az_creds["emb_dn"]) chunks: list[Document] = //gets a value somewhere in my code Chroma.from_documents(chunks, self.embedding_function)

After running this, I still get the 429 from the API, despite only calling it once in my whole application and not calling it anywhere else. I have waited the 86400 seconds (24 hours), but still the same error. Seems like an API issue to me.

The error I get is a slight variation on OP's error though:

openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2024-05-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-06-25T09:58:50.18+00:00

Bas Hulskamp If the above suggestions didn't help, kindly create a support request so that team can look into this internally and provide the resolution as earliest.
Vivek 10 Reputation points

2024-07-01T05:01:50.6933333+00:00

I'm getting the exact same issue even with Pay-as-you-go and the very first Azure OpenAI Bring your own data request. Is there any resolution or more information to this issue?
Max 0 Reputation points

2024-07-04T12:40:22.19+00:00

I've been facing same issue with embeddings and gpt-4o endpoints, even the token limit suppose to be per minute it does not apply all the time, for me it just start working after awhile and I dont have to wait the 24 hours stated.
Leonardo José da Silva 0 Reputation points

2025-02-09T05:15:00.35+00:00

Olá, para resolver problema, basta aumentar a quantidade de tokens por minuto do seu modelo nas configurações de implantação no AI Foundry.

3 answers

Your answer

rishita 35 Reputation points

2024-06-16T06:25:13.23+00:00

I am also facing same issue even if i am running it with Azure Openai studio chat playground.
Bas Hulskamp 40 Reputation points

2024-06-17T06:59:51.0366667+00:00

Same problem here, same region as well. I have also waited the 86400 seconds and the same error still occurs.
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-06-17T16:48:40.99+00:00

Pedro Daniel Scheeffer Pinheiro

I understand that you have limit and still encountering the issue.

To give more context, As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:

Prompt text and count

The max_tokens parameter setting

The best_of parameter setting

As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets. For more details, see Understanding rate limits.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.

Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.

Implement retry logic in your application.

Avoid sharp changes in the workload. Increase the workload gradually.

Test different load increase patterns.

Also, see A Guide to Limits, Quotas, and Best Practices for more details.

Hope this helps. Do let me know if you have any further queries.
Bas Hulskamp 40 Reputation points

2024-06-18T07:19:54.0033333+00:00

Good morning Ashok,

I have the same issue as OP, but as far as I know I only use 1 call to the API. I use API version 2024-05-01-preview and I want to create embeddings via a deployed model.

I use the LangChain library (langchain) and Chroma (langchain-chroma) in Python. This is how I create the embeddings:

self.embedding_function = AzureOpenAIEmbeddings(model="text-embedding-ada-002", api_key=az_creds["key"], azure_endpoint=az_creds["ep"], azure_deployment=az_creds["emb_dn"]) chunks: list[Document] = //gets a value somewhere in my code Chroma.from_documents(chunks, self.embedding_function)

After running this, I still get the 429 from the API, despite only calling it once in my whole application and not calling it anywhere else. I have waited the 86400 seconds (24 hours), but still the same error. Seems like an API issue to me.

The error I get is a slight variation on OP's error though:

openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2024-05-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-06-25T09:58:50.18+00:00

Bas Hulskamp If the above suggestions didn't help, kindly create a support request so that team can look into this internally and provide the resolution as earliest.
Vivek 10 Reputation points

2024-07-01T05:01:50.6933333+00:00

I'm getting the exact same issue even with Pay-as-you-go and the very first Azure OpenAI Bring your own data request. Is there any resolution or more information to this issue?
Max 0 Reputation points

2024-07-04T12:40:22.19+00:00

I've been facing same issue with embeddings and gpt-4o endpoints, even the token limit suppose to be per minute it does not apply all the time, for me it just start working after awhile and I dont have to wait the 24 hours stated.
Leonardo José da Silva 0 Reputation points

2025-02-09T05:15:00.35+00:00

Olá, para resolver problema, basta aumentar a quantidade de tokens por minuto do seu modelo nas configurações de implantação no AI Foundry.

Answer 1

Chris Hoder - MSFT 101 Microsoft Employee Moderator

Hi - Is it possible for you to open a support request as this will let us debug the behavior with your specific requests.

thanks!

Bas Hulskamp 40 Reputation points

2024-06-14T09:20:40.4766667+00:00

Hi @Chris Hoder - MSFT ,

I seem to have the exact same problem. Is there perhaps already more information on this?

Answer 2

Jessie Chen 60

I encountered the same problem and resolved it by increasing the Tokens per Minute rate limit.

For your reference: https://learn.microsoft.com/en-us/answers/questions/1845382/azure-openai-chatbot-server-responded-with-status?orderby=helpful

Answer 3

Hey,

what helped me was just syncing the two rate limits - in my Azure deployment and my Python code.

AZURE DEPLOYMENT (Free Tier, S0, gpt-35-turbo-16k)
User's image

CODE BEFORE (giving the 429 error)

            response = client.chat.completions.create(
                model=azure_oai_deployment,
                temperature=0.7,
                max_tokens=1200,
                messages=messages_array

Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-06-01 have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}

CODE AFTER (working fine)

            response = client.chat.completions.create(
                model=azure_oai_deployment,
                temperature=0.7,
                max_tokens=1000,
                messages=messages_array

Share via

Azure OpenAI Error 429 - Request Below Rate Limit

3 answers

Your answer