GPT-4o/4o mini request via openai-azure-proxy with short text get 429 error but GPT-3.5 not

Question

GPT-4o/4o mini request via openai-azure-proxy with short text get 429 error but GPT-3.5 not

Jiajun Tan 0

I'm using https://github.com/haibbo/cf-openai-azure-proxy/tree/main and https://github.com/stulzq/azure-openai-proxy to transfer my azure OpenAI endpoint to the OpenAI official API form to use some third-party services.

When choosing GPT-4o/4o-mini models, when I send only a small piece of text, it will get a 429 error, meaning that the request has reached TPM limit. However, the same text will not trigger that error when switching to GPT-3.5-Turbo. The response generated by GPT-3.5 is absolutely far from the TPM limit (1k tokens per minute).

Also, if I use Python SDK to send the same text for chat completion, all models get normal responses.

API Version: 2024-07-01-preview

region: eastus

AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-09-02T14:02:54.0433333+00:00

Jiajun Tan Just checking to see if you had a chance to review the below response.

Do let me know if that helps or have any other queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further queries do let us know.

1 answer

Your answer

AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-09-02T14:02:54.0433333+00:00

Jiajun Tan Just checking to see if you had a chance to review the below response.

Do let me know if that helps or have any other queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further queries do let us know.

Answer 1

Jiajun Tan Greetings and Welcome to Microsoft Q&A forum!

When choosing GPT-4o/4o-mini models, when I send only a small piece of text, it will get a 429 error, meaning that the request has reached TPM limit. However, the same text will not trigger that error when switching to GPT-3.5-Turbo. The response generated by GPT-3.5 is absolutely far from the TPM limit (1k tokens per minute).

Could you please double check and confirm if the rate limit of these models are not reached the actual limit in Azure OpenAI studio?

User's image

If the usage limit is over you need to increase the quota.

Please note that different model deployments have unique max TPM values. This represents the maximum amount of TPM that can be allocated to that type of model deployment in a given region. See Manage Azure OpenAI Service quota for more details.

Also, could you check in any other region to isolate the issue?

As you mentioned, you do not have any issues while using SDK, this could also be due to the third party services which you are using.

Do let me know if you have any further queries.

Share via

GPT-4o/4o mini request via openai-azure-proxy with short text get 429 error but GPT-3.5 not

1 answer

Your answer