OpenAI service - Regional quota limits and availability

Jaroslav Galambosi 20 Reputation points
2024-04-01T22:11:54.17+00:00

Dear community.I have 2 questions regarding OpenAI service:

  1. I guess TPM limits for LLMs are soft limits. e.g GPT-4-Turbo for region France Central is 80K TPM. I read this could be increased. Can somebody share information about what increase is feasible and how can be increase achieved?
  2. Are there anywhere rollout time plans of LLMs to be available in other regions/locations? Interested especially in GPT-4-Turbo.

Thank you in advance.

Regards,
Jaroslav Galambosi

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,133 questions
0 comments No comments
{count} votes

Accepted answer
  1. AshokPeddakotla-MSFT 27,126 Reputation points
    2024-04-02T16:46:00.0866667+00:00

    Jaroslav Galambosi Greetings & Welcome to Microsoft Q&A forum!

    Please see the below answers to youir queries.

    I guess TPM limits for LLMs are soft limits. e.g GPT-4-Turbo for region France Central is 80K TPM. I read this could be increased. Can somebody share information about what increase is feasible and how can be increase achieved?

    The increase of the TPM limits purely depends on your scenario and needs.

    When you create a model deployment, you have the option to assign Tokens-Per-Minute (TPM) to that deployment. TPM can be modified in increments of 1,000, and will map to the TPM and RPM rate limits enforced on your deployment.

    If you are not sure about the capacity and would like to increase it dynamically, then you can follow the suggestions provided by Charlie. See Azure OpenAI Dynamic quota (Preview) for more information.

    Dynamic quota is useful in most scenarios, particularly when your application can use extra capacity opportunistically or the application itself is driving the rate at which the Azure OpenAI API is called.

    Typically, the situation in which you might prefer to avoid dynamic quota is when your application would provide an adverse experience if quota is volatile or increased.

    For dynamic quota, consider scenarios such as:

    • Bulk processing,
    • Creating summarizations or embeddings for Retrieval Augmented Generation (RAG),
    • Offline analysis of logs for generation of metrics and evaluations,
    • Low-priority research,
    • Apps that have a small amount of quota allocated.

    Also, If you need to increase the limit for Azure OpenAI Service quotas and limits, you can apply for a quota increase through this form.

    Are there anywhere rollout time plans of LLMs to be available in other regions/locations? Interested especially in GPT-4-Turbo.

    Please note that model availability varies by different factors and also depends on the capacity constraints. Please see GPT-4 and GPT-4 Turbo Preview model availability for more details.

    Currently, there is no exact information which can be shared when will these models be available in other regions.

    Once these models are available in different regions, What's new in Azure OpenAI Service and Azure OpenAI Service models page will be updated accordingly with more details.

    I would suggest you, keep an eye on the above for more updates.

    I hope this answers your query. Do let me know if you have any further queries.


    If the response helped, please do click Accept Answer and Yes for was this answer helpful.

    Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Charlie Wei 1,895 Reputation points
    2024-04-02T00:38:26.0533333+00:00

    Hello Jaroslav Galambosi,

    Firstly, for information on the TPM of all models across different regions, you can refer to the Regional quota limits.

    Next, there are currently three ways to increase your TPM:

    1. Enable Azure OpenAI Dynamic quota when deploying your model, which can temporarily increase TPM, but the quota is determined by the backend.
    2. Refer to this article and fill out the quota application form.
    3. For high-usage production environments, consider Provisioned Throughput Units (PTU).

    Lastly, there are no current plans for model deployment in new regions, but you can always stay updated with the Standard deployment model availability.

    Best regards,
    Charlie


    If you find my response helpful, please consider accepting this answer and voting yes to support the community. Thank you!

    0 comments No comments