Edit

Azure OpenAI image and audio REST API reference (2025-04-01-preview)

This article documents the image generation and audio (speech) data plane inference REST API operations for Azure OpenAI in the 2025-04-01-preview release. For chat completions, embeddings, assistants, responses, vector stores, and all other operations, see the official Azure OpenAI REST API reference.

API specs

Managing and interacting with Azure OpenAI models and resources is divided across three primary API surfaces:

  • Control plane
  • Data plane - authoring
  • Data plane - inference

Each API surface/specification encapsulates a different set of Azure OpenAI capabilities. Each API has its own unique set of preview and stable/generally available (GA) API releases. Preview releases currently tend to follow a monthly cadence.

Important

There is now a new preview inference API. Learn more in our API lifecycle guide.

API Latest preview release Latest GA release Specifications Description
Control plane 2025-07-01-preview 2025-06-01 Spec files The control plane API is used for operations like creating resources, model deployment, and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.
Data plane v1 preview v1 Spec files The data plane API controls inference and authoring operations.

Authentication

Azure OpenAI provides two methods for authentication. You can use either API Keys or Microsoft Entra ID.

  • API Key authentication: For this type of authentication, all API requests must include the API Key in the api-key HTTP header. The Quickstart provides guidance for how to make calls with this type of authentication.

  • Microsoft Entra ID authentication: You can authenticate an API call using a Microsoft Entra token. Authentication tokens are included in a request as the Authorization header. The token provided must be preceded by Bearer, for example Bearer YOUR_AUTH_TOKEN. You can read our how-to guide on authenticating with Microsoft Entra ID.

REST API versioning

The service APIs are versioned using the api-version query parameter. All versions follow the YYYY-MM-DD date structure. For example:

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-06-01

Data plane inference

The rest of this article covers the image and audio operations in the 2025-04-01-preview preview release of the Azure OpenAI data plane inference specification.

For the GA image and audio operations, see the GA image and audio REST API reference.

Transcriptions - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

Transcribes audio into the input language.

URI Parameters

Name In Required Type Description
endpoint path Yes string url Supported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id path Yes string
api-version query Yes string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name Required Type Description
Authorization True string Example: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-key True string Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name Type Description Required Default
model string ID of the model to use. The options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1, and gpt-4o-transcribe-diarize. Yes
file string The audio file object to transcribe. Yes
language string The language of the input audio. Supplying the input language in ISO-639-1 format improves accuracy and latency. No
prompt string An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. No
response_format audioResponseFormat Defines the format of the output. No
temperature number The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit. No 0
timestamp_granularities[] array The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency. No ['segment']

Responses

Status Code: 200

Description: OK

Content-Type Type Description
application/json object
text/plain string Transcribed text in the output format (when response_format was one of text, vtt or srt).

Examples

Example

Gets transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

Responses: Status Code: 200

{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Translations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

Transcribes and translates input audio into English text.

URI Parameters

Name In Required Type Description
endpoint path Yes string url Supported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id path Yes string
api-version query Yes string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name Required Type Description
Authorization True string Example: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-key True string Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name Type Description Required Default
file string The audio file to translate. Yes
prompt string An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English. No
response_format audioResponseFormat Defines the format of the output. No
temperature number The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit. No 0

Responses

Status Code: 200

Description: OK

Content-Type Type Description
application/json object
text/plain string Transcribed text in the output format (when response_format was one of text, vtt, or srt).

Examples

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Speech - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/speech?api-version=2025-04-01-preview

Generates audio from the input text.

URI Parameters

Name In Required Type Description
endpoint path Yes string url Supported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id path Yes string
api-version query Yes string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name Required Type Description
Authorization True string Example: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-key True string Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name Type Description Required Default
input string The text to synthesize audio for. The maximum length is 4,096 characters. Yes
response_format enum The format to synthesize the audio in.
Possible values: mp3, opus, aac, flac, wav, pcm
No
speed number The speed of the synthesized audio. Select a value from 0.25 to 4.0. 1.0 is the default. No 1.0
voice enum The voice to use for speech synthesis.
Possible values: alloy, echo, fable, onyx, nova, shimmer
Yes

Responses

Status Code: 200

Description: OK

Content-Type Type Description
application/octet-stream string

Examples

Example

Synthesizes audio from the provided text.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/speech?api-version=2025-04-01-preview

{
 "input": "Hi! What are you going to make?",
 "voice": "fable",
 "response_format": "mp3"
}

Responses: Status Code: 200

{
  "body": "101010101"
}

Image generations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2025-04-01-preview

Generates a batch of images from a text caption on a given image generation model deployment

URI Parameters

Name In Required Type Description
endpoint path Yes string url Supported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id path Yes string
api-version query Yes string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name Required Type Description
Authorization True string Example: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-key True string Provide Azure OpenAI API key here

Request Body

Content-Type: application/json

Name Type Description Required Default
background imageBackground Allows to set transparency for the background of the generated images. This parameter is only supported for gpt-image-1 series models. No auto
n integer The number of images to generate. For dall-e-3, only n=1 is supported. No 1
output_compression integer The compression level (0-100%) for the generated images. This parameter is only supported for gpt-image-1 series models with the jpeg output format. No 100
output_format imagesOutputFormat The file format in which the generated images are returned. Only supported for gpt-image-1 series models. No png
prompt string A text description of the desired image(s). The maximum length is 32000 characters for gpt-image-1 series and 4000 characters for dall-e-3 Yes
partial_images integer The number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Note that the final image may be sent before the full number of partial images are generated if the full image is generated more quickly. 0
stream boolean Edit the image in streaming mode. no false
quality imageQuality The quality of the image that will be generated. No auto
response_format imagesResponseFormat The format in which the generated images are returned. This parameter isn't supported for gpt-image-1-series models which will always return base64-encoded images.
Possible values: url, b64_json.
No url
size imageSize The size of the generated images. No auto
style imageStyle The style of the generated images. Only supported for dall-e-3. No vivid
user string A unique identifier representing your end-user, which can help to monitor and detect abuse. No

Responses

Status Code: 200

Description: Ok

Content-Type Type Description
application/json generateImagesResponse

Status Code: default

Description: An error occurred.

Content-Type Type Description
application/json dalleErrorResponse

Examples

Example

Creates images given a prompt.

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2025-04-01-preview

{
 "prompt": "In the style of WordArt, Microsoft Clippy wearing a cowboy hat.",
 "n": 1,
 "style": "natural",
 "quality": "standard"
}

Responses: Status Code: 200

{
  "body": {
    "created": 1698342300,
    "data": [
      {
        "revised_prompt": "A vivid, natural representation of Microsoft Clippy wearing a cowboy hat.",
        "prompt_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          },
          "profanity": {
            "detected": false,
            "filtered": false
          },
          "custom_blocklists": {
            "filtered": false,
            "details": []
          }
        },
        "url": "https://dalletipusw2.blob.core.windows.net/private/images/e5451cc6-b1ad-4747-bd46-b89a3a3b8bc3/generated_00.png?se=2023-10-27T17%3A45%3A09Z&...",
        "content_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          }
        }
      }
    ]
  }
}

Image generations - Edit

POST https://{endpoint}/openai/deployments/{deployment-id}/images/edits?api-version=2025-04-01-preview

Edits an image from a text caption on a given gpt-image-1 model deployment

URI Parameters

Name In Required Type Description
endpoint path Yes string url Supported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id path Yes string
api-version query Yes string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name Required Type Description
Authorization True string Example: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-key True string Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name Type Description Required Default
image string or array The image(s) to edit. Must be a supported image file or an array of images. Each image should be a png, or jpg file less than 50MB. Yes
input_fidelity string Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for gpt-image-1 series models. Supports high and low. no low.
mask string An additional image whose fully transparent areas (e.g., where alpha is zero) indicate where the image should be edited. If there are multiple images provided, the mask will be applied to the first image. Must be a valid PNG file, less than 4MB, and have the same dimensions as the image. No
n integer The number of images to generate. Must be between 1 and 10. No 1
prompt string A text description of the desired image(s). The maximum length is 32000 characters. Yes
quality imageQuality The quality of the image that will be generated. No auto
partial_images The number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Note that the final image may be sent before the full number of partial images are generated if the full image is generated more quickly.
stream boolean Edit the image in streaming mode. no false
response_format imagesResponseFormat The format in which the generated images are returned. No url
size imageSize The size of the generated images. No auto
user string A unique identifier representing your end-user, which can help to monitor and detect abuse. No

Responses

Status Code: 200

Description: Ok

Content-Type Type Description
application/json generateImagesResponse

Status Code: default

Description: An error occurred.

Content-Type Type Description
application/json dalleErrorResponse

Components

For the schema definitions used by chat, completions, embeddings, responses, and other text operations, see the Azure OpenAI REST API reference. The following schemas support the image and audio operations on this page.

innerErrorCode

Error codes for the inner error object.

Property Value
Description Error codes for the inner error object.
Type string
Values ResponsibleAIPolicyViolation

dalleErrorResponse

Name Type Description Required Default
error dalleError No

dalleError

Name Type Description Required Default
inner_error dalleInnerError Inner error with additional details. No
param string No
type string No

dalleInnerError

Inner error with additional details.

Name Type Description Required Default
code innerErrorCode Error codes for the inner error object. No
content_filter_results dalleFilterResults Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id. No
revised_prompt string The prompt that was used to generate the image, if there was any revision to the prompt. No

contentFilterSeverityResult

Name Type Description Required Default
filtered boolean Yes
severity string No

contentFilterDetectedResult

Name Type Description Required Default
detected boolean No
filtered boolean Yes

contentFilterDetailedResults

Content filtering results with a detail of content filter ids for the filtered segments.

Name Type Description Required Default
details array No
filtered boolean Yes

dalleFilterResults

Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.

Name Type Description Required Default
custom_blocklists contentFilterDetailedResults Content filtering results with a detail of content filter ids for the filtered segments. No
hate contentFilterSeverityResult No
jailbreak contentFilterDetectedResult No
profanity contentFilterDetectedResult No
self_harm contentFilterSeverityResult No
sexual contentFilterSeverityResult No
violence contentFilterSeverityResult No

audioResponseFormat

Defines the format of the output.

Property Value
Description Defines the format of the output.
Type string
Values json
text
srt
verbose_json
vtt

imageQuality

The quality of the image that will be generated.

Property Value
Description The quality of the image that will be generated.
Type string
Default auto
Values auto
high
medium
low
hd
standard

imagesResponseFormat

The format in which the generated images are returned.

Property Value
Description The format in which the generated images are returned.
Type string
Default url
Values url
b64_json

imagesOutputFormat

The file format in which the generated images are returned. Only supported for series models.

Property Value
Description The file format in which the generated images are returned. Only supported for gpt-image-1 series models.
Type string
Default png
Values png
jpeg

imageSize

The size of the generated images.

Property Value
Description The size of the generated images.
Type string
Default auto
Values auto
1792x1024
1024x1792
1024x1024
1024x1536
1536x1024

imageStyle

The style of the generated images. Only supported for dall-e-3.

Property Value
Description The style of the generated images. Only supported for dall-e-3.
Type string
Default vivid
Values vivid
natural

imageBackground

Allows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.

Property Value
Description Allows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.
Type string
Default auto
Values transparent
opaque
auto

generateImagesResponse

Name Type Description Required Default
created integer The unix timestamp when the operation was created. Yes
data array The result data of the operation, if successful Yes
usage imageGenerationsUsage Represents token usage details for image generation requests. Only for gpt-image-1 series models. No

imageGenerationsUsage

Represents token usage details for image generation requests. Only for gpt-image-1 series models.

Name Type Description Required Default
input_tokens integer The number of input tokens. No
input_tokens_details object A detailed breakdown of the input tokens. No
└─ image_tokens integer The number of image tokens. No
└─ text_tokens integer The number of text tokens. No
output_tokens integer The number of output tokens. No
total_tokens integer The total number of tokens used. No

Next steps

Learn about models and fine-tuning with the REST API. Learn more about the underlying models that power Azure OpenAI.