OpenAI Usage Never Goes Down - Stuck at 100%

hakan458 5 Reputation points
2024-08-07T20:16:52.3166667+00:00

I have 2 separate OpenAI deployments, and each of them has a 30k token per minute rate limit. However, it seems that once I hit this rate limit once, it never goes down again. I have checked through the metrics portal that there are zero HTTP requests, and zero tokens processed in the past hour for example. Yet, I see 30 / 30 under Usage / Limit in the Quotas page. How is this possible? It makes the deployments completely unusable. There must be something I am doing wrong with the deployment(s).

The metrics chart is for the past 1 hour.

User's image

User's image

User's image

EDIT:

I recreated the deployment with 20k TPM and I see now that Usage / Limit is 20 / 30 so I understand now that this is just the amount taken up by the deployment, not being used at this time. However with any single request I hit this error below, even though I am sending very few requests with small data. Any Ideas?

httpx.HTTPStatusError: Client error '429 Too Many Requests' for url 'https://xxxxxx.openai.azure.com//openai/deployments/gpt4omini/chat/completions?api-version=2024-06-01'

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,101 questions
{count} vote

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.