Share via

How to adjust Tokens Per Minute Rate Limit

Kueny, Sean 20 Reputation points
2026-03-24T21:27:12.96+00:00

Hello! I am trying to deploy GPT-4.1 on an Azure Free Account (with a $200 credit for 30 days). During the deployment process, I found that the Tokens per Minute Rate Limit is set at 0 and could not be changed. What steps should I take to raise the Tokens per Minute Rate Limit? I apologize if any of these questions seem amateurish... I am completely new to Microsoft Azure.

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.

0 comments No comments

Answer accepted by question author
  1. Q&A Assist
    2026-03-24T21:27:23.8533333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    On a new Azure subscription, the Tokens-per-Minute (TPM) rate limit of 0 for a GPT model means no quota has been allocated yet for that model in that region. To raise it, TPM must be assigned from the model’s quota for the subscription/region.

    For Azure OpenAI in Azure AI Foundry:

    1. Quota is assigned per subscription, per region, per model in TPM.
    2. When a deployment is created, TPM is assigned to that deployment and directly maps to its enforced TPM rate limit; RPM is then set proportionally.
    3. If the deployment UI shows 0 TPM and does not allow changes, it typically means there is no available quota for that model in that region on the subscription.
    4. In that case, the next step is to request a quota increase for that model/region. Quota increases are requested via the documented quota request process (for example, using the quota request link in the documentation) and, once approved, TPM can be allocated to the deployment.

    On free/trial subscriptions, available quota for Azure OpenAI models can be limited or absent; if no GPT‑4.1 quota is granted, the TPM field will remain effectively fixed at 0 until quota is approved.

    After quota is granted:

    1. Create or edit the GPT‑4.1 deployment in the Azure AI Foundry portal.
    2. Assign a positive TPM value (within the granted quota) to the deployment.
    3. The service will enforce that TPM as the rate limit, and an RPM limit will be applied proportionally for that model.

    If multiple deployments of the same model are created in the same region, ensure the sum of TPM across those deployments does not exceed the total quota for that model/region.


    References:

1 additional answer

Sort by: Most helpful
  1. Kueny, Sean 20 Reputation points
    2026-03-24T21:31:23.6233333+00:00

    Thank you! To make sure I understand, are you saying that it is not possible to deploy a functional GPT-4.1 model using an Azure Free Account? If my understanding is correct, what (if any) changes should I make to my account in order to deploy a functional model?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.