OpenAI Usage Never Goes Down - Stuck at 100%
I have 2 separate OpenAI deployments, and each of them has a 30k token per minute rate limit. However, it seems that once I hit this rate limit once, it never goes down again. I have checked through the metrics portal that there are zero HTTP requests, and zero tokens processed in the past hour for example. Yet, I see 30 / 30 under Usage / Limit in the Quotas page. How is this possible? It makes the deployments completely unusable. There must be something I am doing wrong with the deployment(s).
The metrics chart is for the past 1 hour.
EDIT:
I recreated the deployment with 20k TPM and I see now that Usage / Limit is 20 / 30 so I understand now that this is just the amount taken up by the deployment, not being used at this time. However with any single request I hit this error below, even though I am sending very few requests with small data. Any Ideas?
httpx.HTTPStatusError: Client error '429 Too Many Requests' for url 'https://xxxxxx.openai.azure.com//openai/deployments/gpt4omini/chat/completions?api-version=2024-06-01'