Azure OpenAI Service models
Azure OpenAI Service is powered by a diverse set of models with different capabilities and price points. Model availability varies by region. For GPT-3 and other models retiring in July 2024, see Azure OpenAI Service legacy models.
Models | Description |
---|---|
GPT-4 | A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
GPT-3.5 | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
Embeddings | A set of models that can convert text into numerical vector form to facilitate text similarity. |
DALL-E (Preview) | A series of models in preview that can generate original images from natural language. |
Whisper (Preview) | A series of models in preview that can transcribe and translate speech to text. |
GPT-4
GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like GPT-3.5 Turbo, GPT-4 is optimized for chat and works well for traditional completions tasks. Use the Chat Completions API to use GPT-4. To learn more about how to interact with GPT-4 and the Chat Completions API check out our in-depth how-to.
gpt-4
gpt-4-32k
The gpt-4
model supports 8192 max input tokens and the gpt-4-32k
model supports up to 32,768 tokens.
GPT-3.5
GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is GPT-3.5 Turbo, which has been optimized for chat and works well for traditional completions tasks as well. GPT-3.5 Turbo is available for use with the Chat Completions API. GPT-3.5 Turbo Instruct has similar capabilities to text-davinci-003
using the Completions API instead of the Chat Completions API. We recommend using GPT-3.5 Turbo and GPT-3.5 Turbo Instruct over legacy GPT-3.5 and GPT-3 models.
gpt-35-turbo
gpt-35-turbo-16k
gpt-35-turbo-instruct
The gpt-35-turbo
model supports 4096 max input tokens and the gpt-35-turbo-16k
model supports up to 16,384 tokens. gpt-35-turbo-instruct
supports 4097 max input tokens.
To learn more about how to interact with GPT-3.5 Turbo and the Chat Completions API check out our in-depth how-to.
Embeddings models
Important
We strongly recommend using text-embedding-ada-002 (Version 2)
. This model/version provides parity with OpenAI's text-embedding-ada-002
. To learn more about the improvements offered by this model, please refer to OpenAI's blog post. Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
The previous embeddings models have been consolidated into the following new replacement model:
text-embedding-ada-002
DALL-E (Preview)
The DALL-E models, currently in preview, generate images from text prompts that the user provides.
Whisper (Preview)
The Whisper models, currently in preview, can be used for speech to text.
You can also use the Whisper model via Azure AI Speech batch transcription API. Check out What is the Whisper model? to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
Model summary table and region availability
Important
Due to high demand:
- South Central US is temporarily unavailable for creating new resources and deployments.
GPT-4 models
GPT-4 and GPT-4-32k are now available to all Azure OpenAI Service customers. Availability varies by region. If you don't see GPT-4 in your region, please check back later.
These models can only be used with the Chat Completion API.
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
---|---|---|---|---|
gpt-4 2 (0314) |
East US1, France Central1 | N/A | 8,192 | September 2021 |
gpt-4-32k 2 (0314) |
East US1, France Central1 | N/A | 32,768 | September 2021 |
gpt-4 (0613) |
Australia East1, Canada East, East US1, East US 21, France Central1, Japan East1, Sweden Central, Switzerland North, UK South1 | N/A | 8,192 | September 2021 |
gpt-4-32k (0613) |
Australia East1, Canada East, East US1, East US 21, France Central1, Japan East1, Sweden Central, Switzerland North, UK South1 | N/A | 32,768 | September 2021 |
1 Due to high demand, availability is limited in the region
2 Version 0314
of gpt-4 and gpt-4-32k will be retired no earlier than July 5, 2024. See model updates for model upgrade behavior.
GPT-3.5 models
GPT-3.5 Turbo is used with the Chat Completion API. GPT-3.5 Turbo (0301) can also be used with the Completions API. GPT3.5 Turbo (0613) only supports the Chat Completions API.
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
---|---|---|---|---|
gpt-35-turbo 1 (0301) |
East US, France Central, South Central US, UK South, West Europe | N/A | 4,096 | Sep 2021 |
gpt-35-turbo (0613) |
Australia East, Canada East, East US, East US 2, France Central, Japan East, North Central US, Sweden Central, Switzerland North, UK South | N/A | 4,096 | Sep 2021 |
gpt-35-turbo-16k (0613) |
Australia East, Canada East, East US, East US 2, France Central, Japan East, North Central US, Sweden Central, Switzerland North, UK South | N/A | 16,384 | Sep 2021 |
gpt-35-turbo-instruct (0914) |
East US, Sweden Central | N/A | 4,097 | Sep 2021 |
1 Version 0301
of gpt-35-turbo will be retired no earlier than July 5, 2024. See model updates for model upgrade behavior.
Embeddings models
These models can only be used with Embedding API requests.
Note
We strongly recommend using text-embedding-ada-002 (Version 2)
. This model/version provides parity with OpenAI's text-embedding-ada-002
. To learn more about the improvements offered by this model, please refer to OpenAI's blog post. Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) | Output dimensions |
---|---|---|---|---|---|
text-embedding-ada-002 (version 2) | Canada East, East US, East US2, France Central, Japan East, North Central US, South Central US, Switzerland North, UK South, West Europe | N/A | 8,191 | Sep 2021 | 1536 |
text-embedding-ada-002 (version 1) | East US, South Central US, West Europe | N/A | 2,046 | Sep 2021 | 1536 |
DALL-E models (Preview)
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (characters) | Training Data (up to) |
---|---|---|---|---|
dalle2 | East US | N/A | 1000 | N/A |
Whisper models (Preview)
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (audio file size) | Training Data (up to) |
---|---|---|---|---|
whisper | North Central US, West Europe | N/A | 25 MB | N/A |
Working with models
Finding what models are available
You can get a list of models that are available for both inference and fine-tuning by your Azure OpenAI resource by using the Models List API.
Model updates
Azure OpenAI now supports automatic updates for select model deployments. On models where automatic update support is available, a model version drop-down will be visible in Azure OpenAI Studio under Create new deployment and Edit deployment:
Auto update to default
When Auto-update to default is selected your model deployment will be automatically updated within two weeks of a change in the default version.
If you are still in the early testing phases for inference models, we recommend deploying models with auto-update to default set whenever it is available.
Specific model version
As your use of Azure OpenAI evolves, and you start to build and integrate with applications you may want to manually control model updates so that you can first test and validate that model performance is remaining consistent for your use case prior to upgrade.
When you select a specific model version for a deployment this version will remain selected until you either choose to manually update yourself, or once you reach the retirement date for the model. When the retirement date is reached the model will auto-upgrade to the default version at the time of retirement.
GPT-35-Turbo 0301 and GPT-4 0314 retirement
The gpt-35-turbo
(0301
) and both gpt-4
(0314
) models will be retired no earlier than July 5, 2024. Upon retirement, deployments will automatically be upgraded to the default version at the time of retirement. If you would like your deployment to stop accepting completion requests rather than upgrading, then you will be able to set the model upgrade option to expire through the API. We will publish guidelines on this by September 1.
Viewing deprecation dates
For currently deployed models, from Azure OpenAI Studio select Deployments:
To view deprecation/expiration dates for all available models in a given region from Azure OpenAI Studio select Models > Column options > Select Deprecation fine tune and Deprecation inference:
Model deployment upgrade configuration
There are three distinct model deployment upgrade options which are configurable via REST API:
Name | Description |
---|---|
OnceNewDefaultVersionAvailable |
Once a new version is designated as the default, the model deployment will auto-upgrade to the default version within two weeks of that designation change being made. |
OnceCurrentVersionExpired |
Once the retirement date is reached the model deployment will auto-upgrade to the current default version. |
NoAutoUpgrade |
The model deployment will never auto-upgrade. Once the retirement date is reached the model deployment will stop working. You will need to update your code referencing that deployment to point to a non-expired model deployment. |
To query the current model deployment settings including the deployment upgrade configuration for a given resource use Deployments List
GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments?api-version=2023-05-01
Path parameters
Parameter | Type | Required? | Description |
---|---|---|---|
acountname |
string | Required | The name of your Azure OpenAI Resource. |
resourceGroupName |
string | Required | The name of the associated resource group for this model deployment. |
subscriptionId |
string | Required | Subscription ID for the associated subscription. |
api-version |
string | Required | The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-05-01
Swagger spec
Example response
{
"id": "/subscriptions/{Subcription-GUID}/resourceGroups/{Resource-Group-Name}/providers/Microsoft.CognitiveServices/accounts/{Resource-Name}/deployments/text-davinci-003",
"type": "Microsoft.CognitiveServices/accounts/deployments",
"name": "text-davinci-003",
"sku": {
"name": "Standard",
"capacity": 60
},
"properties": {
"model": {
"format": "OpenAI",
"name": "text-davinci-003",
"version": "1"
},
"versionUpgradeOption": "OnceNewDefaultVersionAvailable",
"capabilities": {
"completion": "true",
"search": "true"
},
"raiPolicyName": "Microsoft.Default",
"provisioningState": "Succeeded",
"rateLimits": [
{
"key": "request",
"renewalPeriod": 10,
"count": 60
},
{
"key": "token",
"renewalPeriod": 60,
"count": 60000
}
]
}
You can then take the settings from this list to construct an update model REST API call as described below if you want to modify the deployment upgrade configuration.
Update & deploy models via the API
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2023-05-01
Path parameters
Parameter | Type | Required? | Description |
---|---|---|---|
acountname |
string | Required | The name of your Azure OpenAI Resource. |
deploymentName |
string | Required | The deployment name you chose when you deployed an existing model or the name you would like a new model deployment to have. |
resourceGroupName |
string | Required | The name of the associated resource group for this model deployment. |
subscriptionId |
string | Required | Subscription ID for the associated subscription. |
api-version |
string | Required | The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-05-01
Swagger spec
Request body
This is only a subset of the available request body parameters. For the full list of the parameters, you can refer to the REST API reference documentation.
Parameter | Type | Description |
---|---|---|
versionUpgradeOption | String | Deployment model version upgrade options:OnceNewDefaultVersionAvailable OnceCurrentVersionExpired NoAutoUpgrade |
capacity | integer | This represents the amount of quota you are assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM) |
Example request
curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/text-embedding-ada-002-test-1?api-version=2023-05-01 \
-H "Content-Type: application/json" \
-H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
-d '{"sku":{"name":"Standard","capacity":1},"properties": {"model": {"format": "OpenAI","name": "text-embedding-ada-002","version": "2"},"versionUpgradeOption":"OnceCurrentVersionExpired"}}'
Note
There are multiple ways to generate an authorization token. The easiest method for initial testing is to launch the Cloud Shell from the Azure portal. Then run az account get-access-token
. You can use this token as your temporary authorization token for API testing.
Example response
{
"id": "/subscriptions/{subscription-id}/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/text-embedding-ada-002-test-1",
"type": "Microsoft.CognitiveServices/accounts/deployments",
"name": "text-embedding-ada-002-test-1",
"sku": {
"name": "Standard",
"capacity": 1
},
"properties": {
"model": {
"format": "OpenAI",
"name": "text-embedding-ada-002",
"version": "2"
},
"versionUpgradeOption": "OnceCurrentVersionExpired",
"capabilities": {
"embeddings": "true",
"embeddingsMaxInputs": "1"
},
"provisioningState": "Succeeded",
"ratelimits": [
{
"key": "request",
"renewalPeriod": 10,
"count": 2
},
{
"key": "token",
"renewalPeriod": 60,
"count": 1000
}
]
},
"systemData": {
"createdBy": "docs@contoso.com",
"createdByType": "User",
"createdAt": "2023-06-13T00:12:38.885937Z",
"lastModifiedBy": "docs@contoso.com",
"lastModifiedByType": "User",
"lastModifiedAt": "2023-06-13T02:41:04.8410965Z"
},
"etag": "\"{GUID}\""
}
Next steps
Feedback
Submit and view feedback for