I am using Text to Speech service, I have selected Neural-Multilingual voice for my usecase, If I select language which is not spoken by Voice, what output should endpoint send?

Question

I am using Text to Speech service, I have selected Neural-Multilingual voice for my usecase, If I select language which is not spoken by Voice, what output should endpoint send?

Nikita Khandare 60

I am using Text to Speech service, I have selected Neural-Multilingual voice for my use case,
If I select language which is not spoken by Voice, what output should endpoint send?

For Example,
I am using a voice - en-US-FableTurboMultilingualNeural
which does not have Marathi language listed.

It is giving me some random response, like voice is speaking in some different language and not marathi.

It should have given a error code instead of audio output with unknown language.

Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-06-10T06:57:22.1233333+00:00

Hi Nikita Khandare,
I have tried with the scenario to validate the behavior of the Text-to-Speech (TTS) service when a Neural-Multilingual voice is used with unsupported languages. When attempting synthesis with an unsupported or non-existent voice, such as mr-IN-FableTurboMultilingualNeural, the service correctly returns an error. In this case, the synthesis is canceled with the following details:

"Speech synthesis canceled: CancellationReason.Error. Error details: Unsupported voice mr-IN-FableTurboMultilingualNeural. Error code: 1007. Received audio size: 0 bytes."

For en-US-FableTurboMultilingualNeural, I was able to get the voice

To help us further investigate and confirm the behavior, could you please share the exact code or request payload you used during your test with the en-US-FableTurboMultilingualNeural voice and Marathi text?

This will help us validate the implementation and provide more targeted guidance.

Nikita Khandare 60

Hi Pavankumar,

Thanks for response.

I am using API endpoint for TTS and sending a request through SSML as an input.
https://centralindia.tts.speech.microsoft.com/cognitiveservices/v1

Sharing you my sample SSML in marathi language (Output received with no error) -

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts"

   xmlns:emo="http://www.w3.org/2009/10/emotionml"

   version="1.0" xml:lang="en-US">

1990 पासून, जी आयटी क्षेत्राची सुरुवात म्हणता येईल, आम्ही आयटी व्यवसाय समर्थन व्यवसाय म्हणून डिजिटल सामग्रीवर केंद्रित अद्वितीय सेवा प्रदान करीत आहोत. सध्या, आम्ही जपानमधील अनेक उपलब्धींसह अग्रगण्य माहिती इंटिग्रेटर्सपैकी एक आहोत. कंपन्यांच्या विविध गरजा जाणून घ्या. <break time="560ms"/>

आम्ही आमच्या क्लायंट कंपन्यांच्या व्यवस्थापन धोरणांवर आधारित सर्वसमावेशक सेवा प्रदान करतो.

</voice>

</speak>

Sample SSML with Hindi Language (Output received)
<speak xmlns="http://www.w3.org/2001/10/synthesis"

   xmlns:mstts="http://www.w3.org/2001/mstts"

   xmlns:emo="http://www.w3.org/2009/10/emotionml"

   version="1.0" xml:lang="en-US">

1990 के बाद से, जिसे आईटी क्षेत्र की शुरुआत कहा जा सकता है, हम आईटी व्यवसाय समर्थन व्यवसाय के रूप में डिजिटल सामग्री पर केंद्रित अनूठी सेवाएं प्रदान कर रहे हैं। वर्तमान में, हम कई उपलब्धियों के साथ जापान में अग्रणी सूचना इंटीग्रेटर्स में से एक हैं। कंपनियों के लिए आवश्यक विभिन्न आवश्यकताओं के बारे में जानें। <break time="560ms"/>

हम अपनी ग्राहक कंपनियों की प्रबंधन रणनीतियों के आधार पर व्यापक सेवाएं प्रदान करते हैं।

</voice>

</speak>

Let me know if I missing something.

Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-06-12T01:42:24.05+00:00

Hi Nikita Khandare,
Did you get any chance to check the response. Thank you!

Accepted answer

0 additional answers

Your answer

Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-06-10T06:57:22.1233333+00:00

Hi Nikita Khandare,
I have tried with the scenario to validate the behavior of the Text-to-Speech (TTS) service when a Neural-Multilingual voice is used with unsupported languages. When attempting synthesis with an unsupported or non-existent voice, such as mr-IN-FableTurboMultilingualNeural, the service correctly returns an error. In this case, the synthesis is canceled with the following details:

"Speech synthesis canceled: CancellationReason.Error. Error details: Unsupported voice mr-IN-FableTurboMultilingualNeural. Error code: 1007. Received audio size: 0 bytes."

For en-US-FableTurboMultilingualNeural, I was able to get the voice

To help us further investigate and confirm the behavior, could you please share the exact code or request payload you used during your test with the en-US-FableTurboMultilingualNeural voice and Marathi text?

This will help us validate the implementation and provide more targeted guidance.
Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-06-12T01:42:24.05+00:00

Hi Nikita Khandare,
Did you get any chance to check the response. Thank you!

Answer 1

Pavankumar Purilla 8,570 Microsoft External Staff Moderator

Hi Nikita Khandare,

I have reproduced this behavior using the sample SSML code provided. When Marathi text is used as input with a multilingual neural voice such as en-US-BrandonMultilingualNeural, the Text to Speech service successfully generates audio output. However, since Marathi is not officially supported by this voice, the pronunciation may be inaccurate or unclear. In such cases, the service does not return an error. Instead, it attempts to phonetically interpret the input using the closest matching phonemes from supported languages. This can result in speech output that sounds incorrect or resembles a different language. This behavior is expected because the service validates only the structure and syntax of the SSML input, not the compatibility of the language content with the selected voice. As a result, even if the spoken output is unintelligible or misleading, the system treats it as a valid request and proceeds to produce audio.
To ensure proper pronunciation and meaningful output, it is recommended to use a voice that officially supports the intended language, as documented in the Azure Text to Speech language support list.
I hope this information helps. Thank you!

Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-06-13T03:10:57.35+00:00

Hi Nikita Khandare,
Just following up to see if you had a chance to review the above response. Thank you!
Nikita Khandare 60 Reputation points

2025-06-13T03:55:34.82+00:00

Hi Pavankumar,

Thank you for your response,
Can you share endpoint though which I can get list of languages voice has set of for?
So that I can validate before making a request?
Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-06-13T05:59:42.6833333+00:00

Hi Nikita Khandare,

You can use the tts.speech.microsoft.com/cognitiveservices/voices/list endpoint to get a full list of voices for a specific region or endpoint. Prefix the voices list endpoint with a region to get a list of voices for that region. For example, to get a list of voices for the westus region, use the https://westus.tts.speech.microsoft.com/cognitiveservices/voices/list endpoint. For a list of all supported regions, see the regions documentation.

For more information: Get a list of voices
Nikita Khandare 60 Reputation points

2025-06-13T10:43:46.65+00:00

Hi Pavankumar,

Thanks for response,
But I did not mean voices in a region,
I need a endpoint for languages spoken by a particular voice in region selected.

Please share that.

Pavankumar Purilla 8,570 Microsoft External Staff Moderator

Hi Nikita Khandare,

There is no dedicated endpoint available to retrieve the list of languages supported by a specific voice in Azure Text to Speech. However, you can achieve this by querying the general voice list endpoint and filtering the response based on the voice name. To do this, use the following endpoint:

GET https://<region>.tts.speech.microsoft.com/cognitiveservices/voices/list

This request does not require a body but must include one of the following headers:

Ocp-Apim-Subscription-Key: <Your_Speech_Resource_Key> or

Authorization: Bearer <Your_Auth_Token>

The response is a JSON array containing all the voices available in the specified region. Each voice object includes key properties such as ShortName, Locale, Gender, VoiceType, and—critically for multilingual voices—the SecondaryLocaleList or SupportedLanguages field. These fields indicate which additional languages that voice can speak.
Sample output:

[
    // Redacted for brevity
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)",
        "DisplayName": "Jenny",
        "LocalName": "Jenny",
        "ShortName": "en-US-JennyNeural",
        "Gender": "Female",
        "Locale": "en-US",
        "LocaleName": "English (United States)",
        "StyleList": [
          "assistant",
          "chat",
          "customerservice",
          "newscast",
          "angry",
          "cheerful",
          "sad",
          "excited",
          "friendly",
          "terrified",
          "shouting",
          "unfriendly",
          "whispering",
          "hopeful"
        ],
        "SampleRateHertz": "24000",
        "VoiceType": "Neural",
        "Status": "GA",
        "ExtendedPropertyMap": {
          "IsHighQuality48K": "True"
        },
        "WordsPerMinute": "152"
    },
    // Redacted for brevity
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (en-US, JennyMultilingualNeural)",
        "DisplayName": "Jenny Multilingual",
        "LocalName": "Jenny Multilingual",
        "ShortName": "en-US-JennyMultilingualNeural",
        "Gender": "Female",
        "Locale": "en-US",
        "LocaleName": "English (United States)",
        "SecondaryLocaleList": [
          "de-DE",
          "en-AU",
          "en-CA",
          "en-GB",
          "es-ES",
          "es-MX",
          "fr-CA",
          "fr-FR",
          "it-IT",
          "ja-JP",
          "ko-KR",
          "pt-BR",
          "zh-CN"
        ],
        "SampleRateHertz": "24000",
        "VoiceType": "Neural",
        "Status": "GA",
        "WordsPerMinute": "190"
    },
    // Redacted for brevity
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ga-IE, OrlaNeural)",
        "DisplayName": "Orla",
        "LocalName": "Orla",
        "ShortName": "ga-IE-OrlaNeural",
        "Gender": "Female",
        "Locale": "ga-IE",
        "LocaleName": "Irish (Ireland)",
        "SampleRateHertz": "24000",
        "VoiceType": "Neural",
        "Status": "GA",
        "WordsPerMinute": "139"
    },
    // Redacted for brevity
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (zh-CN, YunxiNeural)",
        "DisplayName": "Yunxi",
        "LocalName": "云希",
        "ShortName": "zh-CN-YunxiNeural",
        "Gender": "Male",
        "Locale": "zh-CN",
        "LocaleName": "Chinese (Mandarin, Simplified)",
        "StyleList": [
          "narration-relaxed",
          "embarrassed",
          "fearful",
          "cheerful",
          "disgruntled",
          "serious",
          "angry",
          "sad",
          "depressed",
          "chat",
          "assistant",
          "newscast"
        ],
        "SampleRateHertz": "24000",
        "VoiceType": "Neural",
        "Status": "GA",
        "RolePlayList": [
          "Narrator",
          "YoungAdultMale",
          "Boy"
        ],
        "WordsPerMinute": "293"
    },
    // Redacted for brevity
]

For more details, you can refer to the official Azure documentation:
Sample request

Share via

I am using Text to Speech service, I have selected Neural-Multilingual voice for my usecase, If I select language which is not spoken by Voice, what output should endpoint send?

0 additional answers

Your answer