Hi Nikita Khandare,
I have reproduced this behavior using the sample SSML code provided. When Marathi text is used as input with a multilingual neural voice such as en-US-BrandonMultilingualNeural
, the Text to Speech service successfully generates audio output. However, since Marathi is not officially supported by this voice, the pronunciation may be inaccurate or unclear. In such cases, the service does not return an error. Instead, it attempts to phonetically interpret the input using the closest matching phonemes from supported languages. This can result in speech output that sounds incorrect or resembles a different language. This behavior is expected because the service validates only the structure and syntax of the SSML input, not the compatibility of the language content with the selected voice. As a result, even if the spoken output is unintelligible or misleading, the system treats it as a valid request and proceeds to produce audio.
To ensure proper pronunciation and meaningful output, it is recommended to use a voice that officially supports the intended language, as documented in the Azure Text to Speech language support list.
I hope this information helps. Thank you!