Share via


Translation Operations - Create Translation

Creates a translation.

PUT {endpoint}/videotranslation/translations/{translationId}?api-version=2025-05-20

URI Parameters

Name In Required Type Description
endpoint
path True

string

Supported Cognitive Services endpoints (protocol and hostname, for example: https://eastus.api.cognitive.microsoft.com).

translationId
path True

string

minLength: 3
maxLength: 64
pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$

Translation resource ID.

api-version
query True

string

minLength: 1

The API version to use for this operation.

Request Header

Name Required Type Description
Operation-Id True

string

minLength: 3
maxLength: 64
pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$

Operation ID.

Request Body

Name Required Type Description
input True

TranslationInput

Translation input.

description

string

Translation description.

displayName

string

Translation display name.

Responses

Name Type Description
200 OK

Translation

The request has succeeded.

Headers

Operation-Location: string

201 Created

Translation

The request has succeeded and a new resource has been created as a result.

Headers

Operation-Location: string

Other Status Codes

Azure.Core.Foundations.ErrorResponse

An unexpected error response.

Headers

x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Provide your Speech resource key here.

Type: apiKey
In: header

AADToken

These are the Microsoft identity platform flows.

Type: oauth2
Flow: implicit
Authorization URL: https://login.microsoftonline.com/common/oauth2/authorize

Scopes

Name Description
https://cognitiveservices.azure.com/.default

Examples

Create Translation

Sample request

PUT {endpoint}/videotranslation/translations/TranslateMyZhCNVideo?api-version=2025-05-20


{
  "displayName": "hello.mp4",
  "description": "Translate video from en-US to zh-CN.",
  "input": {
    "sourceLocale": "en-US",
    "targetLocale": "zh-CN",
    "voiceKind": "PlatformVoice",
    "enableLipSync": true,
    "videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
  }
}

Sample response

Operation-Location: https://eastus.api.cognitive.microsoft.com/videotranslation/operations/Create-TranslateMyZhCNVideo?api-version=2024-02-01-preview
Operation-Id: Create-TranslateMyZhCNVideo
{
  "id": "TranslateMyZhCNVideo",
  "displayName": "hello.mp4",
  "description": "Translate video from en-US to zh-CN.",
  "input": {
    "sourceLocale": "en-US",
    "targetLocale": "zh-CN",
    "voiceKind": "PlatformVoice",
    "videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
  },
  "createdDateTime": "2023-04-01T05:30:00.000Z",
  "latestIteration": {
    "id": "Initial",
    "status": "NotStarted",
    "input": {
      "speakerCount": 3,
      "subtitleMaxCharCountPerSegment": 80,
      "webvttFile": {
        "url": "https://xxx.blob.core.windows.net/container1/myvtt.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx",
        "kind": "MetadataJson"
      }
    }
  }
}
Operation-Location: https://eastus.api.cognitive.microsoft.com/videotranslation/operations/Create-TranslateMyZhCNVideo?api-version=2024-02-01-preview
Operation-Id: Create-TranslateMyZhCNVideo
{
  "id": "TranslateMyZhCNVideo",
  "description": "Translate video from en-US to zh-CN.",
  "input": {
    "sourceLocale": "en-US",
    "targetLocale": "zh-CN",
    "voiceKind": "PlatformVoice",
    "videoFileUrl": "https://mystorage.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx"
  },
  "createdDateTime": "2023-04-01T05:30:00.000Z",
  "latestIteration": {
    "id": "Initial",
    "status": "NotStarted",
    "input": {
      "speakerCount": 3,
      "subtitleMaxCharCountPerSegment": 80,
      "webvttFile": {
        "url": "https://xxx.blob.core.windows.net/container1/myvtt.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx",
        "kind": "MetadataJson"
      }
    }
  }
}

Definitions

Name Description
Azure.Core.Foundations.Error

The error object.

Azure.Core.Foundations.ErrorResponse

A response containing error details.

Azure.Core.Foundations.InnerError

An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#handling-errors.

EnableEmotionalPlatformVoice

Enable emotional platform voice kind.

Iteration

Do one iteration to translate one video file from source locale to target locale, webvtt for content editing is optional for requesting parameter.

IterationInput

Iteration input.

IterationResult

Iteration result.

Status

Task status.

Translation

Create translation resource for hosting iterations of translating one video file from source locale to target locale.

TranslationInput

Translation input.

VoiceKind

TTS voice kind.

WebvttFile

Translation webvtt file.

WebvttFileKind

Webvtt file kind.

Azure.Core.Foundations.Error

The error object.

Name Type Description
code

string

One of a server-defined set of error codes.

details

Azure.Core.Foundations.Error[]

An array of details about specific errors that led to this reported error.

innererror

Azure.Core.Foundations.InnerError

An object containing more specific information than the current object about the error.

message

string

A human-readable representation of the error.

target

string

The target of the error.

Azure.Core.Foundations.ErrorResponse

A response containing error details.

Name Type Description
error

Azure.Core.Foundations.Error

The error object.

Azure.Core.Foundations.InnerError

An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#handling-errors.

Name Type Description
code

string

One of a server-defined set of error codes.

innererror

Azure.Core.Foundations.InnerError

Inner error.

EnableEmotionalPlatformVoice

Enable emotional platform voice kind.

Value Description
Auto

Let API to decide whether to enable emotional voice for the target locale.

Enable

Force to enable emotional voice if there is voice supported emotion for the target locale.

Disable

Disable platform voice emotion for the target locale.

Iteration

Do one iteration to translate one video file from source locale to target locale, webvtt for content editing is optional for requesting parameter.

Name Type Description
createdDateTime

string (date-time)

The timestamp when the object was created. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).

description

string

Iteration description

failureReason

string

Iteration failure reason

id

string

minLength: 3
maxLength: 64
pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$

Iteration ID

input

IterationInput

Iteration input.

lastActionDateTime

string (date-time)

The timestamp when the current status was entered. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).

result

IterationResult

Iteration result.

status

Status

Iteration status

IterationInput

Iteration input.

Name Type Description
enableEmotionalPlatformVoice

EnableEmotionalPlatformVoice

This parameter specifies whether to enable emotion for platform voice. By default, the server determines whether to apply emotion based on the target locale to optimize quality. If not specified, the API will automatically decide whether to enable emotional expression on the server side.

enableOcrCorrectionFromSubtitle

boolean

Indicate whether to allow the API to correct the speech recognition (SR) results using the subtitles from the original video file. By leveraging the existing subtitles, the API can enhance the accuracy of the transcribed text, ensuring that the final output is more precise and reliable, if not specified, translation will not do correction from OCR subtitle.

enableVideoSpeedAdjustment

boolean

This parameter allows for the adjustment of video playback speed to ensure better alignment with translated audio. When enabled, the API can slow down or speed up the video to match the timing of the translated audio, providing a more synchronized and seamless viewing experience, if not specified, video speed will not be adjusted.

exportSubtitleInVideo

boolean

Export subtitle in video, if not specified, it will inherit the value defined in the input of translation creation.

exportTargetLocaleAdvancedSubtitleFile

boolean

This parameter, when enabled, allows the API to export subtitles in the Advanced SubStation Alpha format. The subtitle file can specify font styles and colors, which helps in addressing character display issues in certain target locales such as Arabic (Ar), Japanese (Ja), Korean (Ko), and Chinese (Ch). By using this parameter, you can ensure that the subtitles are visually appealing and correctly rendered across different languages and regions, if not specified, iteration response will not include advanced subtitle.

speakerCount

integer (int32)

Number of speakers in the video, if not specified, it will inherit the value defined in the input of translation creation.

subtitleFontSize

integer (int32)

This parameter specifies the font size of subtitles in the video translation output between 5 and 30, if not specified, it will use the language dependent default value.

subtitleMaxCharCountPerSegment

integer (int32)

Subtitle max display character count per segment, if not specified, it will inherit the value defined in the input of translation creation.

subtitleOutlineColor

string

minLength: 6
maxLength: 9
pattern: ^#?(?:[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$

This parameter specifies the outline color of the subtitles in the video translation output. The value should be provided in the format <rr><gg><bb>, #<rr><gg><bb>, <rr><gg><bb><aa> or #<rr><gg><bb><aa>, where <rr> represents the red component of the color, <gg> represents the green component, <bb> represents the blue component, <aa> represents the alpha component. For example, EBA205 or #EBA205 would set the subtitle color to a specific shade of yellow. This parameter allows for customization of subtitle appearance to enhance readability and visual appeal, if not specified, it will use default black color.

subtitlePrimaryColor

string

minLength: 6
maxLength: 9
pattern: ^#?(?:[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$

This parameter specifies the primary color of the subtitles in the video translation output. The value should be provided in the format <rr><gg><bb>, #<rr><gg><bb>, <rr><gg><bb><aa> or #<rr><gg><bb><aa>, where <rr> represents the red component of the color, <gg> represents the green component, <bb> represents the blue component, <aa> represents the alpha component. For example, EBA205 or #EBA205 would set the subtitle color to a specific shade of yellow. This parameter allows for customization of subtitle appearance to enhance readability and visual appeal, if not specified, it will use default white color.

ttsCustomLexiconFileIdInAudioContentCreation

string

pattern: ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$

Translate with TTS custom lexicon for speech synthesis. Provide the custom lexicon file using either ttsCustomLexiconFileUrl or ttsCustomLexiconFileIdInAudioContentCreation. These parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.

ttsCustomLexiconFileUrl

string (uri)

Translate with TTS custom lexicon for speech synthesis. Provide the custom lexicon file using either ttsCustomLexiconFileUrl or ttsCustomLexiconFileIdInAudioContentCreation. These parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.

webvttFile

WebvttFile

Webvtt file for content editing, this parameter is required from the second iteration creation request of the translation.

IterationResult

Iteration result.

Name Type Description
metadataJsonWebvttFileUrl

string (uri)

Metadata json webvtt file URL.

reportFileUrl

string (uri)

Report file URL.

sourceLocaleSubtitleWebvttFileUrl

string (uri)

Source locale subtitle file URL.

targetLocaleAdvancedSubtitleFileUrl

string (uri)

This property provides the URL of the target locale Advanced SubStation Alpha (ASS) subtitle file. It is populated only when exportTargetLocaleAdvancedSubtitleFile is set to true during iteration creation; otherwise, this property will not be included in the response.

targetLocaleSubtitleWebvttFileUrl

string (uri)

Target locale subtitle file URL.

translatedAudioFileUrl

string (uri)

Translated audio file URL.

translatedVideoFileUrl

string (uri)

Translated video file URL.

Status

Task status.

Value Description
NotStarted

Not started status

Running

Running status

Succeeded

Run succeeded status

Failed

Run failed status

Canceled

Cancelled status

Translation

Create translation resource for hosting iterations of translating one video file from source locale to target locale.

Name Type Description
createdDateTime

string (date-time)

The timestamp when the object was created. The timestamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).

description

string

Translation description.

displayName

string

Translation display name.

failureReason

string

Translation failure reason

id

string

minLength: 3
maxLength: 64
pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{1,62}[a-zA-Z0-9]$

Translation resource ID.

input

TranslationInput

Translation input.

latestIteration

Iteration

Latest iteration of the translation.

latestSucceededIteration

Iteration

Latest completed iteration of the translation.

TranslationInput

Translation input.

Name Type Description
audioFileUrl

string (uri)

Translation audio file Azure blob url, .mp3 or .wav file format, maxmum 5GB file size and 4 hours video duration. Provide the input media file using either videoFileUrl or audioFileUrl, these parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.

enableLipSync

boolean

Indicate whether to enable lip sync, if not provided, the default value is false to disable the lip sync.

exportSubtitleInVideo

boolean

Export subtitle in video, if not specified, the default value is false, it will not burn subtitle to the translated video file.

sourceLocale

string

minLength: 5
maxLength: 16
pattern: ^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}|[0-9]{3}))?$

The source locale of the video file. Locale code follows BCP-47. You can find the text to speech locale list here https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts , if not specified, the source locale will be auto-detected from the video file, the auto detect is only supported after version 2025-05-20.

speakerCount

integer (int32)

Number of speakers in the video, if not provided, it will be auto-detected from the video file.

subtitleMaxCharCountPerSegment

integer (int32)

Subtitle max display character count per segment, if not provided, it will use the language dependent default value.

targetLocale

string

minLength: 5
maxLength: 16
pattern: ^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}|[0-9]{3}))?$

The target locale of the translation. Locale code follows BCP-47. You can find the text to speech locale list here https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts.

videoFileUrl

string (uri)

Translation video file Azure blob url, .mp4 file format, maxmum 5GB file size and 4 hours video duration. Provide the input media file using either videoFileUrl or audioFileUrl, these parameters are mutually exclusive—only one of them is required. If both are provided, the request will be considered invalid.

voiceKind

VoiceKind

Translation voice kind.

VoiceKind

TTS voice kind.

Value Description
PlatformVoice

TTS platform voice

PersonalVoice

TTS personal voice

WebvttFile

Translation webvtt file.

Name Type Description
kind

WebvttFileKind

Translation webvtt file kind.

url

string (uri)

Translation webvtt file url.

WebvttFileKind

Webvtt file kind.

Value Description
SourceLocaleSubtitle

Source locale plain text subtitle webvtt file

TargetLocaleSubtitle

Target locale plain text subtitle webvtt file

MetadataJson

Target locale metadata JSON webvtt file