OpenAI Usage Never Goes Down - Stuck at 100%

Question

OpenAI Usage Never Goes Down - Stuck at 100%

hakan458 5

I have 2 separate OpenAI deployments, and each of them has a 30k token per minute rate limit. However, it seems that once I hit this rate limit once, it never goes down again. I have checked through the metrics portal that there are zero HTTP requests, and zero tokens processed in the past hour for example. Yet, I see 30 / 30 under Usage / Limit in the Quotas page. How is this possible? It makes the deployments completely unusable. There must be something I am doing wrong with the deployment(s).

The metrics chart is for the past 1 hour.

User's image

EDIT:

I recreated the deployment with 20k TPM and I see now that Usage / Limit is 20 / 30 so I understand now that this is just the amount taken up by the deployment, not being used at this time. However with any single request I hit this error below, even though I am sending very few requests with small data. Any Ideas?

httpx.HTTPStatusError: Client error '429 Too Many Requests' for url 'https://xxxxxx.openai.azure.com//openai/deployments/gpt4omini/chat/completions?api-version=2024-06-01'

hakan458 5 Reputation points

2024-08-07T20:54:04.09+00:00

I deleted my gpt-4o-mini deployment and recreated it with 20k TPM limit. Now I see my usage is 20 / 30 so if I understand correctly this Quotas page is just showing how much of the available quota is used by a deployment, but is not showing how much is actually being processed by the deployment. That makes sense.

However, when I first deploy a model and use it, it is very fast and I am getting back results no problem. After a few times of running a small batch of data though, it becomes unresponsive and dont get back any output. Seems this is the real issue I have to figure out. Any ideas there?
hakan458 5 Reputation points

2024-08-07T21:22:22.72+00:00
I recreated the deployment with 20k TPM and I see now that Usage / Limit is 20 / 30 so I understand now that this is just the amount taken up by the deployment, not being used at this time. However with any single request I hit this error below, even though I am sending very few requests with small data. Any Ideas?

httpx.httpstatuserror: Client error '429 Too Many Requests' for url 'https://xxxxxx.openai.azure.com//openai/deployments/gpt4omini/chat/completions?api-version=2024-06-01'
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-08-08T07:54:23.3033333+00:00
hakan458 Greetings & Welcome to Microsoft Q&A forum!

I understand that you have recreated the deployment but still seeing 429 error.

As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.

Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.

Implement retry logic in your application.

Avoid sharp changes in the workload. Increase the workload gradually.

Test different load increase patterns.

Please see Manage Azure OpenAI Service quota and Optimizing Azure OpenAI: A Guide to Limits, Quotas, and Best Practices for more information.

Do let me know if that helps.
hakan458 5 Reputation points

2024-08-08T18:58:17.8066667+00:00

@AshokPeddakotla-MSFT The thing is it does not reset every minute. Even if I have no activity for an hour, I hit this error immediately. When I first created the 4o-mini deployment it worked great for a few minutes, then just like my 3.5 turbo deployment it gets into this state that is unusable.
hakan458 5 Reputation points

2024-08-09T16:39:56.4633333+00:00

Here is an example - total of all tokens is much less than 30k however I hit rate limits immediately
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-08-13T09:15:08.53+00:00

hakan458

The thing is it does not reset every minute. Even if I have no activity for an hour, I hit this error immediately. When I first created the 4o-mini deployment it worked great for a few minutes, then just like my 3.5 turbo deployment it gets into this state that is unusable.

Thanks for sharing the additional details. This issue needs further investigation to find out the root cause. For a deeper investigation and immediate assistance on this issue, please file a support request @ https://aka.ms/azsupt?

Your answer

hakan458 5 Reputation points

2024-08-07T20:54:04.09+00:00

I deleted my gpt-4o-mini deployment and recreated it with 20k TPM limit. Now I see my usage is 20 / 30 so if I understand correctly this Quotas page is just showing how much of the available quota is used by a deployment, but is not showing how much is actually being processed by the deployment. That makes sense.

However, when I first deploy a model and use it, it is very fast and I am getting back results no problem. After a few times of running a small batch of data though, it becomes unresponsive and dont get back any output. Seems this is the real issue I have to figure out. Any ideas there?
hakan458 5 Reputation points

2024-08-07T21:22:22.72+00:00

I recreated the deployment with 20k TPM and I see now that Usage / Limit is 20 / 30 so I understand now that this is just the amount taken up by the deployment, not being used at this time. However with any single request I hit this error below, even though I am sending very few requests with small data. Any Ideas?

httpx.httpstatuserror: Client error '429 Too Many Requests' for url 'https://xxxxxx.openai.azure.com//openai/deployments/gpt4omini/chat/completions?api-version=2024-06-01'
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-08-08T07:54:23.3033333+00:00

hakan458 Greetings & Welcome to Microsoft Q&A forum!

I understand that you have recreated the deployment but still seeing 429 error.

As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.

Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.

Implement retry logic in your application.

Avoid sharp changes in the workload. Increase the workload gradually.

Test different load increase patterns.

Please see Manage Azure OpenAI Service quota and Optimizing Azure OpenAI: A Guide to Limits, Quotas, and Best Practices for more information.

Do let me know if that helps.
hakan458 5 Reputation points

2024-08-08T18:58:17.8066667+00:00

@AshokPeddakotla-MSFT The thing is it does not reset every minute. Even if I have no activity for an hour, I hit this error immediately. When I first created the 4o-mini deployment it worked great for a few minutes, then just like my 3.5 turbo deployment it gets into this state that is unusable.
hakan458 5 Reputation points

2024-08-09T16:39:56.4633333+00:00

Here is an example - total of all tokens is much less than 30k however I hit rate limits immediately
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-08-13T09:15:08.53+00:00

hakan458

The thing is it does not reset every minute. Even if I have no activity for an hour, I hit this error immediately. When I first created the 4o-mini deployment it worked great for a few minutes, then just like my 3.5 turbo deployment it gets into this state that is unusable.

Thanks for sharing the additional details. This issue needs further investigation to find out the root cause. For a deeper investigation and immediate assistance on this issue, please file a support request @ https://aka.ms/azsupt?

Share via

OpenAI Usage Never Goes Down - Stuck at 100%

Your answer