AzureAvatarVoiceSyncVoice interface

Azure avatar voice sync configuration. Uses personal voice synthesis with avatar character.

Extends

Properties

customLexiconUrl

URL of a custom lexicon file for pronunciation customization.

customTextNormalizationUrl

URL of a custom text normalization endpoint.

locale

Enforced locale in BCP-47 format for TTS output. If set, TTS will always use the specified locale to speak. For example, setting locale to en-US forces American English accent for all text content, even if the text is in another language, and TTS will output silence for unsupported languages (e.g., Chinese text with en-US locale). If not set, TTS automatically detects the language from the text content.

model

Underlying neural model to use.

pitch

Pitch adjustment for the voice output. Follows the same rules as the pitch attribute of the SSML prosody element (see https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody). Typical values: a named level (x-low, low, medium, high, x-high, default), a relative change (e.g., +10%, -5%, +50Hz, -2st), or an absolute frequency (e.g., 200Hz).

preferLocales

Preferred locales in BCP-47 format that change the accents of languages. If not set, TTS uses the default accent for each language (e.g., American English for English, Mexican Spanish for Spanish). Setting this to ["en-GB", "es-ES"] changes the English accent to British English and the Spanish accent to European Spanish, while TTS can still speak other languages like French or Chinese with their default accents.

rate

Speaking rate adjustment for the voice output. Follows the same rules as the rate attribute of the SSML prosody element (see https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody). Typical values: a named level (x-slow, slow, medium, fast, x-fast, default), a relative percentage (e.g., +20%, -10%), or a non-negative multiplier (e.g., 0.5, 1.5).

style

Speaking style for the voice (e.g., 'cheerful', 'sad').

temperature

Temperature must be between 0.0 and 1.0.

type

The discriminator possible values: azure-custom, azure-standard, azure-personal, avatar-voice-sync

volume

Volume adjustment for the voice output. Follows the same rules as the volume attribute of the SSML prosody element (see https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody). Typical values: a named level (silent, x-soft, soft, medium, loud, x-loud, default), an absolute number from 0.0 to 100.0, or a relative change (e.g., +10, -6dB).

Property Details

customLexiconUrl

URL of a custom lexicon file for pronunciation customization.

customLexiconUrl?: string

Property Value

string

customTextNormalizationUrl

URL of a custom text normalization endpoint.

customTextNormalizationUrl?: string

Property Value

string

locale

Enforced locale in BCP-47 format for TTS output. If set, TTS will always use the specified locale to speak. For example, setting locale to en-US forces American English accent for all text content, even if the text is in another language, and TTS will output silence for unsupported languages (e.g., Chinese text with en-US locale). If not set, TTS automatically detects the language from the text content.

locale?: string

Property Value

string

model

Underlying neural model to use.

model: string

Property Value

string

pitch

Pitch adjustment for the voice output. Follows the same rules as the pitch attribute of the SSML prosody element (see https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody). Typical values: a named level (x-low, low, medium, high, x-high, default), a relative change (e.g., +10%, -5%, +50Hz, -2st), or an absolute frequency (e.g., 200Hz).

pitch?: string

Property Value

string

preferLocales

Preferred locales in BCP-47 format that change the accents of languages. If not set, TTS uses the default accent for each language (e.g., American English for English, Mexican Spanish for Spanish). Setting this to ["en-GB", "es-ES"] changes the English accent to British English and the Spanish accent to European Spanish, while TTS can still speak other languages like French or Chinese with their default accents.

preferLocales?: string[]

Property Value

string[]

rate

Speaking rate adjustment for the voice output. Follows the same rules as the rate attribute of the SSML prosody element (see https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody). Typical values: a named level (x-slow, slow, medium, fast, x-fast, default), a relative percentage (e.g., +20%, -10%), or a non-negative multiplier (e.g., 0.5, 1.5).

rate?: string

Property Value

string

style

Speaking style for the voice (e.g., 'cheerful', 'sad').

style?: string

Property Value

string

temperature

Temperature must be between 0.0 and 1.0.

temperature?: number

Property Value

number

type

The discriminator possible values: azure-custom, azure-standard, azure-personal, avatar-voice-sync

type: "avatar-voice-sync"

Property Value

"avatar-voice-sync"

volume

Volume adjustment for the voice output. Follows the same rules as the volume attribute of the SSML prosody element (see https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody). Typical values: a named level (silent, x-soft, soft, medium, loud, x-loud, default), an absolute number from 0.0 to 100.0, or a relative change (e.g., +10, -6dB).

volume?: string

Property Value

string