Not fully understanding quota on Azure OpenAI services

Luke Field 20 Reputation points
2024-07-04T02:23:46.2366667+00:00

So I currently am trying to create a chatbot based on enterprise data through a webapp in Azure. It seemed initially I wasn't having problems, but all of a sudden recently that I am in the final steps of my process, I am constantly confronted with this error?

Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-05-01-preview have exceeded token rate limit of your current AIServices S0 pricing tier. Please retry after 86400 seconds. Please contact Azure support service if you would like to further increase the default rate limit.'}}

I have just increased my quota, and currently have 30,000 tokens per minute, which says gives me 180 requests per minute. So I am not fully understanding.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,227 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,897 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 24,910 Reputation points Microsoft Employee
    2024-07-04T09:40:09.69+00:00

    @Luke Field Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    .

    The error message you’re seeing is related to rate limiting, which is a common practice in APIs to prevent abuse and ensure fair usage. In your case, the error message indicates that you’ve exceeded the token rate limit of your current AIServices S0 pricing tier.

    .

    Azure OpenAI’s quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). More info here.

    User's image

    .

    The rate limit for the ChatCompletions_Create Operation under Azure OpenAI API version 2024-05-01-preview is determined by the number of tokens in your requests, not just the number of requests. Each request can contain a different number of tokens, depending on the length and complexity of the text. If your requests contain a large number of tokens, you could hit your rate limit even if the number of requests is within the limit.

    Background about the limits:

    Tokens-Per-Minute (TPM) and Requests-Per-Minute (RPM) rate limits for the deployment.

    TPM rate limits are based on the maximum number of tokens that are estimated to be processed by a request at the time the request is received.

    RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute.

    More info here.

    .

    .

    Suggestions and best practices:

    To minimize issues related to rate limits follow the steps outlined here.
    .
    .
    *
    View and request quota:***
    For an all up view of your quota allocations across deployments in a given region, select Management > Quota in Azure AI Studio:
    User's image

    Usage/Limit: For the quota name, this shows how much quota is used by deployments and the total quota approved for this subscription and region. This amount of quota used is also represented in the bar graph.

    .

    .

    Also you can leverage the Usage metrics to check the current usage:

    User's image

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.