An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
On a new Azure subscription, the Tokens-per-Minute (TPM) rate limit of 0 for a GPT model means no quota has been allocated yet for that model in that region. To raise it, TPM must be assigned from the model’s quota for the subscription/region.
For Azure OpenAI in Azure AI Foundry:
- Quota is assigned per subscription, per region, per model in TPM.
- When a deployment is created, TPM is assigned to that deployment and directly maps to its enforced TPM rate limit; RPM is then set proportionally.
- If the deployment UI shows 0 TPM and does not allow changes, it typically means there is no available quota for that model in that region on the subscription.
- In that case, the next step is to request a quota increase for that model/region. Quota increases are requested via the documented quota request process (for example, using the quota request link in the documentation) and, once approved, TPM can be allocated to the deployment.
On free/trial subscriptions, available quota for Azure OpenAI models can be limited or absent; if no GPT‑4.1 quota is granted, the TPM field will remain effectively fixed at 0 until quota is approved.
After quota is granted:
- Create or edit the GPT‑4.1 deployment in the Azure AI Foundry portal.
- Assign a positive TPM value (within the granted quota) to the deployment.
- The service will enforce that TPM as the rate limit, and an RPM limit will be applied proportionally for that model.
If multiple deployments of the same model are created in the same region, ensure the sum of TPM across those deployments does not exceed the total quota for that model/region.
References: