Translation Operations - Create Translation

Service:: Azure AI Services

API Version:: 2025-05-20

Creates a translation.

PUT {endpoint}/videotranslation/translations/{translationId}?api-version=2025-05-20

URI Parameters

Name	In	Required	Type	Description
endpoint	path	True	string	Supported Cognitive Services endpoints (protocol and hostname, for example: https://eastus.api.cognitive.microsoft.com).
translationId	path	True	string minLength: 3 maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$	Translation resource ID.
api-version	query	True	string minLength: 1	The API version to use for this operation.

Request Header

Name	Required	Type	Description
Operation-Id	True	string minLength: 3 maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$	Operation ID.

Request Body

Name	Required	Type	Description
input	True	TranslationInput	Translation input.
description		string	Translation description.
displayName		string	Translation display name.

Responses

Name	Type	Description
200 OK	Translation	The request has succeeded. Headers Operation-Location: string
201 Created	Translation	The request has succeeded and a new resource has been created as a result. Headers Operation-Location: string
Other Status Codes	Azure.Core.Foundations.ErrorResponse	An unexpected error response. Headers x-ms-error-code: string

Name

Type

Description

200 OK

Translation

The request has succeeded.

Headers

Operation-Location: string

201 Created

Translation

The request has succeeded and a new resource has been created as a result.

Headers

Operation-Location: string

Other Status Codes

Azure.Core.Foundations.ErrorResponse

An unexpected error response.

Headers

x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Provide your Speech resource key here.

Type: apiKey
In: header

AADToken

These are the Microsoft identity platform flows.

Type: oauth2
Flow: implicit
Authorization URL: https://login.microsoftonline.com/common/oauth2/authorize

Scopes

Name	Description
https://cognitiveservices.azure.com/.default

Examples

Create Translation

Sample request

HTTP

PUT {endpoint}/videotranslation/translations/TranslateMyZhCNVideo?api-version=2025-05-20


{
  "displayName": "hello.mp4",
  "description": "Translate video from en-US to zh-CN.",
  "input": {
    "sourceLocale": "en-US",
    "targetLocale": "zh-CN",
    "voiceKind": "PlatformVoice",
    "enableLipSync": true,
    "videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
  }
}

Sample response

Status code:: 200

Operation-Location: https://eastus.api.cognitive.microsoft.com/videotranslation/operations/Create-TranslateMyZhCNVideo?api-version=2024-02-01-preview
Operation-Id: Create-TranslateMyZhCNVideo

{
  "id": "TranslateMyZhCNVideo",
  "displayName": "hello.mp4",
  "description": "Translate video from en-US to zh-CN.",
  "input": {
    "sourceLocale": "en-US",
    "targetLocale": "zh-CN",
    "voiceKind": "PlatformVoice",
    "videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
  },
  "createdDateTime": "2023-04-01T05:30:00.000Z",
  "latestIteration": {
    "id": "Initial",
    "status": "NotStarted",
    "input": {
      "speakerCount": 3,
      "subtitleMaxCharCountPerSegment": 80,
      "webvttFile": {
        "url": "https://xxx.blob.core.windows.net/container1/myvtt.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx",
        "kind": "MetadataJson"
      }
    }
  }
}

Status code:: 201

Operation-Location: https://eastus.api.cognitive.microsoft.com/videotranslation/operations/Create-TranslateMyZhCNVideo?api-version=2024-02-01-preview
Operation-Id: Create-TranslateMyZhCNVideo

{
  "id": "TranslateMyZhCNVideo",
  "description": "Translate video from en-US to zh-CN.",
  "input": {
    "sourceLocale": "en-US",
    "targetLocale": "zh-CN",
    "voiceKind": "PlatformVoice",
    "videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
  },
  "createdDateTime": "2023-04-01T05:30:00.000Z",
  "latestIteration": {
    "id": "Initial",
    "status": "NotStarted",
    "input": {
      "speakerCount": 3,
      "subtitleMaxCharCountPerSegment": 80,
      "webvttFile": {
        "url": "https://xxx.blob.core.windows.net/container1/myvtt.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx",
        "kind": "MetadataJson"
      }
    }
  }
}

Definitions

Name	Description
Azure.Core.Foundations.Error	The error object.
Azure.Core.Foundations.ErrorResponse	A response containing error details.
Azure.Core.Foundations.InnerError	An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#handling-errors.
EnableEmotionalPlatformVoice	Enable emotional platform voice kind.
Iteration	Do one iteration to translate one video file from source locale to target locale, webvtt for content editing is optional for requesting parameter.
IterationInput	Iteration input.
IterationResult	Iteration result.
Status	Task status.
Translation	Create translation resource for hosting iterations of translating one video file from source locale to target locale.
TranslationInput	Translation input.
VoiceKind	TTS voice kind.
WebvttFile	Translation webvtt file.
WebvttFileKind	Webvtt file kind.

Azure.Core.Foundations.Error

Object

The error object.

Name	Type	Description
code	string	One of a server-defined set of error codes.
details	Azure.Core.Foundations.Error[]	An array of details about specific errors that led to this reported error.
innererror	Azure.Core.Foundations.InnerError	An object containing more specific information than the current object about the error.
message	string	A human-readable representation of the error.
target	string	The target of the error.

Azure.Core.Foundations.ErrorResponse

Object

A response containing error details.

Name	Type	Description
error	Azure.Core.Foundations.Error	The error object.

Azure.Core.Foundations.InnerError

Object

An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#handling-errors.

Name	Type	Description
code	string	One of a server-defined set of error codes.
innererror	Azure.Core.Foundations.InnerError	Inner error.

EnableEmotionalPlatformVoice

Enumeration

Enable emotional platform voice kind.

Value	Description
Auto	Let API to decide whether to enable emotional voice for the target locale.
Enable	Force to enable emotional voice if there is voice supported emotion for the target locale.
Disable	Disable platform voice emotion for the target locale.

Iteration

Object

Do one iteration to translate one video file from source locale to target locale, webvtt for content editing is optional for requesting parameter.

Name	Type	Description
createdDateTime	string (date-time)	The timestamp when the object was created. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).
description	string	Iteration description
failureReason	string	Iteration failure reason
id	string minLength: 3 maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$	Iteration ID
input	IterationInput	Iteration input.
lastActionDateTime	string (date-time)	The timestamp when the current status was entered. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).
result	IterationResult	Iteration result.
status	Status	Iteration status

IterationInput

Object

Iteration input.

Name	Type	Description
enableEmotionalPlatformVoice	EnableEmotionalPlatformVoice	This parameter specifies whether to enable emotion for platform voice. By default, the server determines whether to apply emotion based on the target locale to optimize quality. If not specified, the API will automatically decide whether to enable emotional expression on the server side.
enableOcrCorrectionFromSubtitle	boolean	Indicate whether to allow the API to correct the speech recognition (SR) results using the subtitles from the original video file. By leveraging the existing subtitles, the API can enhance the accuracy of the transcribed text, ensuring that the final output is more precise and reliable, if not specified, translation will not do correction from OCR subtitle.
enableVideoSpeedAdjustment	boolean	This parameter allows for the adjustment of video playback speed to ensure better alignment with translated audio. When enabled, the API can slow down or speed up the video to match the timing of the translated audio, providing a more synchronized and seamless viewing experience, if not specified, video speed will not be adjusted.
exportSubtitleInVideo	boolean	Export subtitle in video, if not specified, it will inherit the value defined in the input of translation creation.
exportTargetLocaleAdvancedSubtitleFile	boolean	This parameter, when enabled, allows the API to export subtitles in the Advanced SubStation Alpha format. The subtitle file can specify font styles and colors, which helps in addressing character display issues in certain target locales such as Arabic (Ar), Japanese (Ja), Korean (Ko), and Chinese (Ch). By using this parameter, you can ensure that the subtitles are visually appealing and correctly rendered across different languages and regions, if not specified, iteration response will not include advanced subtitle.
speakerCount	integer (int32)	Number of speakers in the video, if not specified, it will inherit the value defined in the input of translation creation.
subtitleFontSize	integer (int32)	This parameter specifies the font size of subtitles in the video translation output between 5 and 30, if not specified, it will use the language dependent default value.
subtitleMaxCharCountPerSegment	integer (int32)	Subtitle max display character count per segment, if not specified, it will inherit the value defined in the input of translation creation.
subtitleOutlineColor	string minLength: 6 maxLength: 9 pattern: ^#?(?:[0-9A-Fa-f]{6}\|[0-9A-Fa-f]{8})$	This parameter specifies the outline color of the subtitles in the video translation output. The value should be provided in the format <rr><gg><bb>, #<rr><gg><bb>, <rr><gg><bb><aa> or #<rr><gg><bb><aa>, where <rr> represents the red component of the color, <gg> represents the green component, <bb> represents the blue component, <aa> represents the alpha component. For example, EBA205 or #EBA205 would set the subtitle color to a specific shade of yellow. This parameter allows for customization of subtitle appearance to enhance readability and visual appeal, if not specified, it will use default black color.
subtitlePrimaryColor	string minLength: 6 maxLength: 9 pattern: ^#?(?:[0-9A-Fa-f]{6}\|[0-9A-Fa-f]{8})$	This parameter specifies the primary color of the subtitles in the video translation output. The value should be provided in the format <rr><gg><bb>, #<rr><gg><bb>, <rr><gg><bb><aa> or #<rr><gg><bb><aa>, where <rr> represents the red component of the color, <gg> represents the green component, <bb> represents the blue component, <aa> represents the alpha component. For example, EBA205 or #EBA205 would set the subtitle color to a specific shade of yellow. This parameter allows for customization of subtitle appearance to enhance readability and visual appeal, if not specified, it will use default white color.
ttsCustomLexiconFileIdInAudioContentCreation	string pattern: ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$	Translate with TTS custom lexicon for speech synthesis. Provide the custom lexicon file using either ttsCustomLexiconFileUrl or ttsCustomLexiconFileIdInAudioContentCreation. These parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.
ttsCustomLexiconFileUrl	string (uri)	Translate with TTS custom lexicon for speech synthesis. Provide the custom lexicon file using either ttsCustomLexiconFileUrl or ttsCustomLexiconFileIdInAudioContentCreation. These parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.
webvttFile	WebvttFile	Webvtt file for content editing, this parameter is required from the second iteration creation request of the translation.

IterationResult

Object

Iteration result.

Name	Type	Description
metadataJsonWebvttFileUrl	string (uri)	Metadata json webvtt file URL.
reportFileUrl	string (uri)	Report file URL.
sourceLocaleSubtitleWebvttFileUrl	string (uri)	Source locale subtitle file URL.
targetLocaleAdvancedSubtitleFileUrl	string (uri)	This property provides the URL of the target locale Advanced SubStation Alpha (ASS) subtitle file. It is populated only when exportTargetLocaleAdvancedSubtitleFile is set to true during iteration creation; otherwise, this property will not be included in the response.
targetLocaleSubtitleWebvttFileUrl	string (uri)	Target locale subtitle file URL.
translatedAudioFileUrl	string (uri)	Translated audio file URL.
translatedVideoFileUrl	string (uri)	Translated video file URL.

Status

Enumeration

Task status.

Value	Description
NotStarted	Not started status
Running	Running status
Succeeded	Run succeeded status
Failed	Run failed status
Canceled	Cancelled status

Translation

Object

Create translation resource for hosting iterations of translating one video file from source locale to target locale.

Name	Type	Description
createdDateTime	string (date-time)	The timestamp when the object was created. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).
description	string	Translation description.
displayName	string	Translation display name.
failureReason	string	Translation failure reason
id	string minLength: 3 maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$	Translation resource ID.
input	TranslationInput	Translation input.
latestIteration	Iteration	Latest iteration of the translation.
latestSucceededIteration	Iteration	Latest completed iteration of the translation.

TranslationInput

Object

Translation input.

Name	Type	Description
audioFileUrl	string (uri)	Translation audio file Azure blob url, .mp3 or .wav file format, maxmum 5GB file size and 4 hours video duration. Provide the input media file using either videoFileUrl or audioFileUrl, these parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.
enableLipSync	boolean	Indicate whether to enable lip sync, if not provided, the default value is false to disable the lip sync.
exportSubtitleInVideo	boolean	Export subtitle in video, if not specified, the default value is false, it will not burn subtitle to the translated video file.
sourceLocale	string minLength: 5 maxLength: 16 pattern: ^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}\|[0-9]{3}))?$	The source locale of the video file. Locale code follows BCP-47. You can find the text to speech locale list here https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts , if not specified, the source locale will be auto-detected from the video file, the auto detect is only supported after version 2025-05-20.
speakerCount	integer (int32)	Number of speakers in the video, if not provided, it will be auto-detected from the video file.
subtitleMaxCharCountPerSegment	integer (int32)	Subtitle max display character count per segment, if not provided, it will use the language dependent default value.
targetLocale	string minLength: 5 maxLength: 16 pattern: ^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}\|[0-9]{3}))?$	The target locale of the translation. Locale code follows BCP-47. You can find the text to speech locale list here https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts.
videoFileUrl	string (uri)	Translation video file Azure blob url, .mp4 file format, maxmum 5GB file size and 4 hours video duration. Provide the input media file using either videoFileUrl or audioFileUrl, these parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.
voiceKind	VoiceKind	Translation voice kind.

VoiceKind

Enumeration

TTS voice kind.

Value	Description
PlatformVoice	TTS platform voice
PersonalVoice	TTS personal voice

WebvttFile

Object

Translation webvtt file.

Name	Type	Description
kind	WebvttFileKind	Translation webvtt file kind.
url	string (uri)	Translation webvtt file url.

WebvttFileKind

Enumeration

Webvtt file kind.

Value	Description
SourceLocaleSubtitle	Source locale plain text subtitle webvtt file
TargetLocaleSubtitle	Target locale plain text subtitle webvtt file
MetadataJson	Target locale metadata JSON webvtt file

Share via

Translation Operations - Create Translation

URI Parameters

Request Header

Request Body

Responses

Security

Ocp-Apim-Subscription-Key

AADToken

Scopes

Examples

Create Translation

Sample request

Sample response

Definitions

Azure.Core.Foundations.Error

Azure.Core.Foundations.ErrorResponse

Azure.Core.Foundations.InnerError

EnableEmotionalPlatformVoice

Iteration

IterationInput

IterationResult

Status

Translation

TranslationInput

VoiceKind

WebvttFile

WebvttFileKind