Different word pronounciation

Steinkrug, Michelle 71 Reputation points
2022-07-19T05:58:09.923+00:00

Hello,

I´m currently working with MS Speech Studio and I´m using the Text-to Speech function. Currently I produce German audio files and I was wondering why sometimes the words are spoken correctly and sometimes not.

Here is an example. If I write the following sentence in MS Speech Studio:

"Diese Phase ist als Dichteunterschied von 10 bis 30 Hounsfield-Einheiten zwischen Aorta abdominalis und Vena cava inferior definiert."

Then the speaker pronounciates the word "inferior" with an English accent and there is nor reason for it. If I add an additional blank space between the word "inferior" and "definiert" suddenly the speaker speaks the word correctly and recognizes, that this is a latin term. In some other positions of the entire text he also recognizes and pronounciates the word "inferior" correctly, only together with the word "definiert" it is currently not working.

Best,

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,391 questions
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
353 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,364 questions
{count} votes

Accepted answer
  1. romungi-MSFT 41,866 Reputation points Microsoft Employee
    2022-07-26T08:00:31.707+00:00

    @Steinkrug, Michelle I have just received some feedback from product team that this could be an issue with language detection by the model and it is observed that in some cases the model identifies the word with different language id, in this case it is en-US so the pronunciation sounds as English with a German voice. One workaround that has been suggested is to use the <lang> tag in the SSML for such a discrepancy to ensure the model explicitly pronounces the word in German. This is not an ideal scenario if you are using a real time scenario as input but if you are creating audio files for offline use you could use the workaround and generate an appropriate sounding file.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments

0 additional answers

Sort by: Most helpful