Jiajun Tan Greetings and Welcome to Microsoft Q&A forum!
When choosing GPT-4o/4o-mini models, when I send only a small piece of text, it will get a 429 error, meaning that the request has reached TPM limit. However, the same text will not trigger that error when switching to GPT-3.5-Turbo. The response generated by GPT-3.5 is absolutely far from the TPM limit (1k tokens per minute).
Could you please double check and confirm if the rate limit of these models are not reached the actual limit in Azure OpenAI studio?
If the usage limit is over you need to increase the quota.
Please note that different model deployments have unique max TPM values. This represents the maximum amount of TPM that can be allocated to that type of model deployment in a given region. See Manage Azure OpenAI Service quota for more details.
Also, could you check in any other region to isolate the issue?
As you mentioned, you do not have any issues while using SDK, this could also be due to the third party services which you are using.
Do let me know if you have any further queries.