Edit

Automate Azure OpenAI deployments with quota in Microsoft Foundry

This article contains brief example templates to help get you started programmatically creating Azure OpenAI deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version 2023-05-01 for resource management related activities. This API version is only for managing your resources, and doesn't impact the API version used for inferencing calls like completions, chat completions, embedding, image generation, etc.

Prerequisites

Before you create deployments programmatically, complete the following:

Each tab in this article lists any tool-specific prerequisites, such as the required Azure CLI or Az PowerShell module version.

Create a deployment and query usage

Select the tab for the tool or template language you want to use. Each tab includes a deployment example that sets a TPM-based capacity, followed by a usage query that returns your remaining quota in the specified region.

Deployment

PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2023-05-01

Path parameters

Parameter Type Required? Description
accountName string Required The name of your Azure OpenAI Resource.
deploymentName string Required The deployment name you chose when you deployed an existing model or the name you would like a new model deployment to have.
resourceGroupName string Required The name of the associated resource group for this model deployment.
subscriptionId string Required Subscription ID for the associated subscription.
api-version string Required The API version to use for this operation. This follows the YYYY-MM-DD format.

Supported versions

Request body

This is only a subset of the available request body parameters. For the full list of the parameters, you can refer to the REST API reference documentation.

Parameter Type Description
sku Sku The resource model definition representing SKU.
capacity integer This represents the amount of quota you're assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM).

Example request

curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/gpt-4o-test-deployment?api-version=2023-05-01 \
  -H "Content-Type: application/json" \
  -H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
  -d '{"sku":{"name":"Standard","capacity":10},"properties": {"model": {"format": "OpenAI","name": "gpt-4o","version": "2024-11-20"}}}'

Note

There are multiple ways to generate an authorization token. The easiest method for initial testing is to launch the Cloud Shell from the Azure portal. Then run az account get-access-token. You can use this token as your temporary authorization token for API testing.

For more information, see the REST API reference documentation for usages and deployment.

Usage

To query your quota usage in a given region, for a specific subscription

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{location}/usages?api-version=2023-05-01

Path parameters

Parameter Type Required? Description
subscriptionId string Required Subscription ID for the associated subscription.
location string Required Location to view usage for ex: eastus
api-version string Required The API version to use for this operation. This follows the YYYY-MM-DD format.

Supported versions

Example request

curl -X GET https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01 \
  -H "Content-Type: application/json" \
  -H 'Authorization: Bearer YOUR_AUTH_TOKEN'