@Luke Field Welcome to Microsoft Q&A Forum, Thank you for posting your query here!
.
.
The error message you’re seeing is related to rate limiting, which is a common practice in APIs to prevent abuse and ensure fair usage. In your case, the error message indicates that you’ve exceeded the token rate limit of your current AIServices S0 pricing tier.
.
Azure OpenAI’s quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). More info here.
.
The rate limit for the ChatCompletions_Create Operation under Azure OpenAI API version 2024-05-01-preview is determined by the number of tokens in your requests, not just the number of requests. Each request can contain a different number of tokens, depending on the length and complexity of the text. If your requests contain a large number of tokens, you could hit your rate limit even if the number of requests is within the limit.
Background about the limits:
Tokens-Per-Minute (TPM) and Requests-Per-Minute (RPM) rate limits for the deployment.
TPM rate limits are based on the maximum number of tokens that are estimated to be processed by a request at the time the request is received.
RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute.
More info here.
.
.
Suggestions and best practices:
To minimize issues related to rate limits follow the steps outlined here.
.
.*
View and request quota:***
For an all up view of your quota allocations across deployments in a given region, select Management > Quota in Azure AI Studio:
Usage/Limit: For the quota name, this shows how much quota is used by deployments and the total quota approved for this subscription and region. This amount of quota used is also represented in the bar graph.
.
.
Also you can leverage the Usage metrics to check the current usage:
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.