Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article contains brief example templates to help get you started programmatically creating Azure OpenAI deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version 2023-05-01 for resource management related activities. This API version is only for managing your resources, and doesn't impact the API version used for inferencing calls like completions, chat completions, embedding, image generation, etc.
Prerequisites
Before you create deployments programmatically, complete the following:
- An Azure subscription. Create one for free.
- An existing Azure OpenAI resource. To create one, see Create a resource and deploy a model with Azure OpenAI.
- Quota available in the target region for the model you want to deploy. To check or request quota, see Manage Azure OpenAI in Microsoft Foundry Models quota.
- Permissions to create deployments on the resource. The Cognitive Services Contributor role at the resource scope provides the required access. For details, see Role-based access control for Azure OpenAI.
- The model name and version that you want to deploy. For supported models, see Azure OpenAI models.
Each tab in this article lists any tool-specific prerequisites, such as the required Azure CLI or Az PowerShell module version.
Create a deployment and query usage
Select the tab for the tool or template language you want to use. Each tab includes a deployment example that sets a TPM-based capacity, followed by a usage query that returns your remaining quota in the specified region.
Deployment
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2023-05-01
Path parameters
| Parameter | Type | Required? | Description |
|---|---|---|---|
accountName |
string | Required | The name of your Azure OpenAI Resource. |
deploymentName |
string | Required | The deployment name you chose when you deployed an existing model or the name you would like a new model deployment to have. |
resourceGroupName |
string | Required | The name of the associated resource group for this model deployment. |
subscriptionId |
string | Required | Subscription ID for the associated subscription. |
api-version |
string | Required | The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-05-01Swagger spec
Request body
This is only a subset of the available request body parameters. For the full list of the parameters, you can refer to the REST API reference documentation.
| Parameter | Type | Description |
|---|---|---|
| sku | Sku | The resource model definition representing SKU. |
| capacity | integer | This represents the amount of quota you're assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM). |
Example request
curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/gpt-4o-test-deployment?api-version=2023-05-01 \
-H "Content-Type: application/json" \
-H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
-d '{"sku":{"name":"Standard","capacity":10},"properties": {"model": {"format": "OpenAI","name": "gpt-4o","version": "2024-11-20"}}}'
Note
There are multiple ways to generate an authorization token. The easiest method for initial testing is to launch the Cloud Shell from the Azure portal. Then run az account get-access-token. You can use this token as your temporary authorization token for API testing.
For more information, see the REST API reference documentation for usages and deployment.
Usage
To query your quota usage in a given region, for a specific subscription
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{location}/usages?api-version=2023-05-01
Path parameters
| Parameter | Type | Required? | Description |
|---|---|---|---|
subscriptionId |
string | Required | Subscription ID for the associated subscription. |
location |
string | Required | Location to view usage for ex: eastus |
api-version |
string | Required | The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-05-01Swagger spec
Example request
curl -X GET https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01 \
-H "Content-Type: application/json" \
-H 'Authorization: Bearer YOUR_AUTH_TOKEN'