Translation Operations - Create Translation
Creates a translation.
PUT {endpoint}/videotranslation/translations/{translationId}?api-version=2025-05-20
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
endpoint
|
path | True |
string |
Supported Cognitive Services endpoints (protocol and hostname, for example: https://eastus.api.cognitive.microsoft.com). |
translation
|
path | True |
string minLength: 3maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$ |
Translation resource ID. |
api-version
|
query | True |
string minLength: 1 |
The API version to use for this operation. |
Request Header
Name | Required | Type | Description |
---|---|---|---|
Operation-Id | True |
string minLength: 3maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$ |
Operation ID. |
Request Body
Name | Required | Type | Description |
---|---|---|---|
input | True |
Translation input. |
|
description |
string |
Translation description. |
|
displayName |
string |
Translation display name. |
Responses
Name | Type | Description |
---|---|---|
200 OK |
The request has succeeded. Headers Operation-Location: string |
|
201 Created |
The request has succeeded and a new resource has been created as a result. Headers Operation-Location: string |
|
Other Status Codes |
An unexpected error response. Headers x-ms-error-code: string |
Security
Ocp-Apim-Subscription-Key
Provide your Speech resource key here.
Type:
apiKey
In:
header
AADToken
These are the Microsoft identity platform flows.
Type:
oauth2
Flow:
implicit
Authorization URL:
https://login.microsoftonline.com/common/oauth2/authorize
Scopes
Name | Description |
---|---|
https://cognitiveservices.azure.com/.default |
Examples
Create Translation
Sample request
PUT {endpoint}/videotranslation/translations/TranslateMyZhCNVideo?api-version=2025-05-20
{
"displayName": "hello.mp4",
"description": "Translate video from en-US to zh-CN.",
"input": {
"sourceLocale": "en-US",
"targetLocale": "zh-CN",
"voiceKind": "PlatformVoice",
"enableLipSync": true,
"videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
}
}
Sample response
Operation-Location: https://eastus.api.cognitive.microsoft.com/videotranslation/operations/Create-TranslateMyZhCNVideo?api-version=2024-02-01-preview
Operation-Id: Create-TranslateMyZhCNVideo
{
"id": "TranslateMyZhCNVideo",
"displayName": "hello.mp4",
"description": "Translate video from en-US to zh-CN.",
"input": {
"sourceLocale": "en-US",
"targetLocale": "zh-CN",
"voiceKind": "PlatformVoice",
"videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
},
"createdDateTime": "2023-04-01T05:30:00.000Z",
"latestIteration": {
"id": "Initial",
"status": "NotStarted",
"input": {
"speakerCount": 3,
"subtitleMaxCharCountPerSegment": 80,
"webvttFile": {
"url": "https://xxx.blob.core.windows.net/container1/myvtt.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx",
"kind": "MetadataJson"
}
}
}
}
Operation-Location: https://eastus.api.cognitive.microsoft.com/videotranslation/operations/Create-TranslateMyZhCNVideo?api-version=2024-02-01-preview
Operation-Id: Create-TranslateMyZhCNVideo
{
"id": "TranslateMyZhCNVideo",
"description": "Translate video from en-US to zh-CN.",
"input": {
"sourceLocale": "en-US",
"targetLocale": "zh-CN",
"voiceKind": "PlatformVoice",
"videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
},
"createdDateTime": "2023-04-01T05:30:00.000Z",
"latestIteration": {
"id": "Initial",
"status": "NotStarted",
"input": {
"speakerCount": 3,
"subtitleMaxCharCountPerSegment": 80,
"webvttFile": {
"url": "https://xxx.blob.core.windows.net/container1/myvtt.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx",
"kind": "MetadataJson"
}
}
}
}
Definitions
Name | Description |
---|---|
Azure. |
The error object. |
Azure. |
A response containing error details. |
Azure. |
An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#handling-errors. |
Enable |
Enable emotional platform voice kind. |
Iteration |
Do one iteration to translate one video file from source locale to target locale, webvtt for content editing is optional for requesting parameter. |
Iteration |
Iteration input. |
Iteration |
Iteration result. |
Status |
Task status. |
Translation |
Create translation resource for hosting iterations of translating one video file from source locale to target locale. |
Translation |
Translation input. |
Voice |
TTS voice kind. |
Webvtt |
Translation webvtt file. |
Webvtt |
Webvtt file kind. |
Azure.Core.Foundations.Error
The error object.
Name | Type | Description |
---|---|---|
code |
string |
One of a server-defined set of error codes. |
details |
An array of details about specific errors that led to this reported error. |
|
innererror |
An object containing more specific information than the current object about the error. |
|
message |
string |
A human-readable representation of the error. |
target |
string |
The target of the error. |
Azure.Core.Foundations.ErrorResponse
A response containing error details.
Name | Type | Description |
---|---|---|
error |
The error object. |
Azure.Core.Foundations.InnerError
An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#handling-errors.
Name | Type | Description |
---|---|---|
code |
string |
One of a server-defined set of error codes. |
innererror |
Inner error. |
EnableEmotionalPlatformVoice
Enable emotional platform voice kind.
Value | Description |
---|---|
Auto |
Let API to decide whether to enable emotional voice for the target locale. |
Enable |
Force to enable emotional voice if there is voice supported emotion for the target locale. |
Disable |
Disable platform voice emotion for the target locale. |
Iteration
Do one iteration to translate one video file from source locale to target locale, webvtt for content editing is optional for requesting parameter.
Name | Type | Description |
---|---|---|
createdDateTime |
string (date-time) |
The timestamp when the object was created. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
description |
string |
Iteration description |
failureReason |
string |
Iteration failure reason |
id |
string minLength: 3maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$ |
Iteration ID |
input |
Iteration input. |
|
lastActionDateTime |
string (date-time) |
The timestamp when the current status was entered. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
result |
Iteration result. |
|
status |
Iteration status |
IterationInput
Iteration input.
Name | Type | Description |
---|---|---|
enableEmotionalPlatformVoice |
This parameter specifies whether to enable emotion for platform voice. By default, the server determines whether to apply emotion based on the target locale to optimize quality. If not specified, the API will automatically decide whether to enable emotional expression on the server side. |
|
enableOcrCorrectionFromSubtitle |
boolean |
Indicate whether to allow the API to correct the speech recognition (SR) results using the subtitles from the original video file. By leveraging the existing subtitles, the API can enhance the accuracy of the transcribed text, ensuring that the final output is more precise and reliable, if not specified, translation will not do correction from OCR subtitle. |
enableVideoSpeedAdjustment |
boolean |
This parameter allows for the adjustment of video playback speed to ensure better alignment with translated audio. When enabled, the API can slow down or speed up the video to match the timing of the translated audio, providing a more synchronized and seamless viewing experience, if not specified, video speed will not be adjusted. |
exportSubtitleInVideo |
boolean |
Export subtitle in video, if not specified, it will inherit the value defined in the input of translation creation. |
exportTargetLocaleAdvancedSubtitleFile |
boolean |
This parameter, when enabled, allows the API to export subtitles in the Advanced SubStation Alpha format. The subtitle file can specify font styles and colors, which helps in addressing character display issues in certain target locales such as Arabic (Ar), Japanese (Ja), Korean (Ko), and Chinese (Ch). By using this parameter, you can ensure that the subtitles are visually appealing and correctly rendered across different languages and regions, if not specified, iteration response will not include advanced subtitle. |
speakerCount |
integer (int32) |
Number of speakers in the video, if not specified, it will inherit the value defined in the input of translation creation. |
subtitleFontSize |
integer (int32) |
This parameter specifies the font size of subtitles in the video translation output between 5 and 30, if not specified, it will use the language dependent default value. |
subtitleMaxCharCountPerSegment |
integer (int32) |
Subtitle max display character count per segment, if not specified, it will inherit the value defined in the input of translation creation. |
subtitleOutlineColor |
string minLength: 6maxLength: 9 pattern: ^#?(?:[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$ |
This parameter specifies the outline color of the subtitles in the video translation output. The value should be provided in the format <rr><gg><bb>, #<rr><gg><bb>, <rr><gg><bb><aa> or #<rr><gg><bb><aa>, where <rr> represents the red component of the color, <gg> represents the green component, <bb> represents the blue component, <aa> represents the alpha component. For example, EBA205 or #EBA205 would set the subtitle color to a specific shade of yellow. This parameter allows for customization of subtitle appearance to enhance readability and visual appeal, if not specified, it will use default black color. |
subtitlePrimaryColor |
string minLength: 6maxLength: 9 pattern: ^#?(?:[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$ |
This parameter specifies the primary color of the subtitles in the video translation output. The value should be provided in the format <rr><gg><bb>, #<rr><gg><bb>, <rr><gg><bb><aa> or #<rr><gg><bb><aa>, where <rr> represents the red component of the color, <gg> represents the green component, <bb> represents the blue component, <aa> represents the alpha component. For example, EBA205 or #EBA205 would set the subtitle color to a specific shade of yellow. This parameter allows for customization of subtitle appearance to enhance readability and visual appeal, if not specified, it will use default white color. |
ttsCustomLexiconFileIdInAudioContentCreation |
string pattern: ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$ |
Translate with TTS custom lexicon for speech synthesis. Provide the custom lexicon file using either ttsCustomLexiconFileUrl or ttsCustomLexiconFileIdInAudioContentCreation. These parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid. |
ttsCustomLexiconFileUrl |
string (uri) |
Translate with TTS custom lexicon for speech synthesis. Provide the custom lexicon file using either ttsCustomLexiconFileUrl or ttsCustomLexiconFileIdInAudioContentCreation. These parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid. |
webvttFile |
Webvtt file for content editing, this parameter is required from the second iteration creation request of the translation. |
IterationResult
Iteration result.
Name | Type | Description |
---|---|---|
metadataJsonWebvttFileUrl |
string (uri) |
Metadata json webvtt file URL. |
reportFileUrl |
string (uri) |
Report file URL. |
sourceLocaleSubtitleWebvttFileUrl |
string (uri) |
Source locale subtitle file URL. |
targetLocaleAdvancedSubtitleFileUrl |
string (uri) |
This property provides the URL of the target locale Advanced SubStation Alpha (ASS) subtitle file. It is populated only when exportTargetLocaleAdvancedSubtitleFile is set to true during iteration creation; otherwise, this property will not be included in the response. |
targetLocaleSubtitleWebvttFileUrl |
string (uri) |
Target locale subtitle file URL. |
translatedAudioFileUrl |
string (uri) |
Translated audio file URL. |
translatedVideoFileUrl |
string (uri) |
Translated video file URL. |
Status
Task status.
Value | Description |
---|---|
NotStarted |
Not started status |
Running |
Running status |
Succeeded |
Run succeeded status |
Failed |
Run failed status |
Canceled |
Cancelled status |
Translation
Create translation resource for hosting iterations of translating one video file from source locale to target locale.
Name | Type | Description |
---|---|---|
createdDateTime |
string (date-time) |
The timestamp when the object was created. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
description |
string |
Translation description. |
displayName |
string |
Translation display name. |
failureReason |
string |
Translation failure reason |
id |
string minLength: 3maxLength: 64 pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$ |
Translation resource ID. |
input |
Translation input. |
|
latestIteration |
Latest iteration of the translation. |
|
latestSucceededIteration |
Latest completed iteration of the translation. |
TranslationInput
Translation input.
Name | Type | Description |
---|---|---|
audioFileUrl |
string (uri) |
Translation audio file Azure blob url, .mp3 or .wav file format, maxmum 5GB file size and 4 hours video duration. Provide the input media file using either videoFileUrl or audioFileUrl, these parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid. |
enableLipSync |
boolean |
Indicate whether to enable lip sync, if not provided, the default value is false to disable the lip sync. |
exportSubtitleInVideo |
boolean |
Export subtitle in video, if not specified, the default value is false, it will not burn subtitle to the translated video file. |
sourceLocale |
string minLength: 5maxLength: 16 pattern: ^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}|[0-9]{3}))?$ |
The source locale of the video file. Locale code follows BCP-47. You can find the text to speech locale list here https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts , if not specified, the source locale will be auto-detected from the video file, the auto detect is only supported after version 2025-05-20. |
speakerCount |
integer (int32) |
Number of speakers in the video, if not provided, it will be auto-detected from the video file. |
subtitleMaxCharCountPerSegment |
integer (int32) |
Subtitle max display character count per segment, if not provided, it will use the language dependent default value. |
targetLocale |
string minLength: 5maxLength: 16 pattern: ^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}|[0-9]{3}))?$ |
The target locale of the translation. Locale code follows BCP-47. You can find the text to speech locale list here https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts. |
videoFileUrl |
string (uri) |
Translation video file Azure blob url, .mp4 file format, maxmum 5GB file size and 4 hours video duration. Provide the input media file using either videoFileUrl or audioFileUrl, these parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid. |
voiceKind |
Translation voice kind. |
VoiceKind
TTS voice kind.
Value | Description |
---|---|
PlatformVoice |
TTS platform voice |
PersonalVoice |
TTS personal voice |
WebvttFile
Translation webvtt file.
Name | Type | Description |
---|---|---|
kind |
Translation webvtt file kind. |
|
url |
string (uri) |
Translation webvtt file url. |
WebvttFileKind
Webvtt file kind.
Value | Description |
---|---|
SourceLocaleSubtitle |
Source locale plain text subtitle webvtt file |
TargetLocaleSubtitle |
Target locale plain text subtitle webvtt file |
MetadataJson |
Target locale metadata JSON webvtt file |