Bug Report: Mispronunciation of Isolated Hungarian Words in Azure Neural TTS (hu-HU-NoemiNeural), but not in context

Verbari LLC 20 Reputation points
2024-09-24T17:22:22.3+00:00

Description:
The Azure Neural TTS system is mispronouncing specific Hungarian words when using the hu-HU-NoemiNeural voice. The issue affects more than half of the vocabulary words in a recent production run of words (full SSML shared at bottom of this post). The words are mispronounced when synthesized both in production environments and directly through the Azure Speech Studio.

Steps to Reproduce:

  1. Input SSML: Below is an example SSML excerpt that highlights the mispronunciations. In every case, the first instance of the word is mispronounced. In some cases, the second word is :
       <voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> pillantott. pillantott,</prosody><prosody rate="-35%"> pillantott.</prosody></voice>  <voice name="en-US-SaraNeural">glanced</voice>
       <voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> oldalra. oldalra,</prosody><prosody rate="-35%"> oldalra.</prosody></voice>  <voice name="en-US-SaraNeural">sideways</voice>
       <voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> mosoly. mosoly,</prosody><prosody rate="-35%"> mosoly.</prosody></voice>  <voice name="en-US-SaraNeural">smile</voice>
       <voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> figyelni. figyelni,</prosody><prosody rate="-35%"> figyelni.</prosody></voice>  <voice name="en-US-SaraNeural">observe</voice>
    
    The issue can be reproduced by submitting any of these words (pillantott, oldalra, mosoly, figyelni) individually through the Azure Speech Studio or API.
  2. Select Voice: Choose the hu-HU-NoemiNeural voice in the Azure Neural TTS system.
  3. Generate Speech:
    Synthesize the SSML or individual words using either the Azure Speech Studio or the API.

Expected Result:
The words should be pronounced correctly and naturally in Hungarian, regardless of prosody adjustments.

Actual Result:
The word pillantott is consistently mispronounced in all cases. Similarly, other words like oldalra, mosoly, and figyelni are sometimes mispronounced, though the behavior can be inconsistent. This issue disrupts the flow and clarity of Hungarian speech synthesis.

  • hu-HU-NoemiNeural exhibits this issue, but the hu-HU-TamasNeural voice does not have the same problem.

Additional Notes:

  • This problem persists even when submitting the words individually via the Azure Speech Studio. It appears to affect over 50% of Hungarian vocabulary words in the provided SSML.
  • The behavior is reproducible in both Speech Studio and the API.

Thank you for your attention to this matter.

Full SSML:

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> őszi. őszi,</prosody><prosody rate="-35%"> őszi.</prosody></voice>  <voice name="en-US-SaraNeural">autumn</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> aranyos. aranyos,</prosody><prosody rate="-35%"> aranyos.</prosody></voice>  <voice name="en-US-SaraNeural">golden</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> vibráló. vibráló,</prosody><prosody rate="-35%"> vibráló.</prosody></voice>  <voice name="en-US-SaraNeural">vibrant</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> takaró. takaró,</prosody><prosody rate="-35%"> takaró.</prosody></voice>  <voice name="en-US-SaraNeural">blanket</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> csiripeltek. csiripeltek,</prosody><prosody rate="-35%"> csiripeltek.</prosody></voice>  <voice name="en-US-SaraNeural">chirped</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> hullámzó. hullámzó,</prosody><prosody rate="-35%"> hullámzó.</prosody></voice>  <voice name="en-US-SaraNeural">rippling</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> objektív. objektív,</prosody><prosody rate="-35%"> objektív.</prosody></voice>  <voice name="en-US-SaraNeural">lens</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> megörökíteni. megörökíteni,</prosody><prosody rate="-35%"> megörökíteni.</prosody></voice>  <voice name="en-US-SaraNeural">capture</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> kalandozott. kalandozott,</prosody><prosody rate="-35%"> kalandozott.</prosody></voice>  <voice name="en-US-SaraNeural">wandered</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> külföldön. külföldön,</prosody><prosody rate="-35%"> külföldön.</prosody></voice>  <voice name="en-US-SaraNeural">abroad</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> álláslehetőség. álláslehetőség,</prosody><prosody rate="-35%"> álláslehetőség.</prosody></voice>  <voice name="en-US-SaraNeural">job offer</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> pillantott. pillantott,</prosody><prosody rate="-35%"> pillantott.</prosody></voice>  <voice name="en-US-SaraNeural">glanced</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> oldalra. oldalra,</prosody><prosody rate="-35%"> oldalra.</prosody></voice>  <voice name="en-US-SaraNeural">sideways</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> mosoly. mosoly,</prosody><prosody rate="-35%"> mosoly.</prosody></voice>  <voice name="en-US-SaraNeural">smile</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> figyelni. figyelni,</prosody><prosody rate="-35%"> figyelni.</prosody></voice>  <voice name="en-US-SaraNeural">observe</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> hámozni. hámozni,</prosody><prosody rate="-35%"> hámozni.</prosody></voice>  <voice name="en-US-SaraNeural">peeling</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> zavar. zavar,</prosody><prosody rate="-35%"> zavar.</prosody></voice>  <voice name="en-US-SaraNeural">bothering</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> őszinte. őszinte,</prosody><prosody rate="-35%"> őszinte.</prosody></voice>  <voice name="en-US-SaraNeural">sincere</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> összeszorult. összeszorult,</prosody><prosody rate="-35%"> összeszorult.</prosody></voice>  <voice name="en-US-SaraNeural">tightened</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> bátorítóan. bátorítóan,</prosody><prosody rate="-35%"> bátorítóan.</prosody></voice>  <voice name="en-US-SaraNeural">encouragingly</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> ragyogóan. ragyogóan,</prosody><prosody rate="-35%"> ragyogóan.</prosody></voice>  <voice name="en-US-SaraNeural">brilliantly</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> közelség. közelség,</prosody><prosody rate="-35%"> közelség.</prosody></voice>  <voice name="en-US-SaraNeural">proximity</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> pótolható. pótolható,</prosody><prosody rate="-35%"> pótolható.</prosody></voice>  <voice name="en-US-SaraNeural">replaced</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> naplemente. naplemente,</prosody><prosody rate="-35%"> naplemente.</prosody></voice>  <voice name="en-US-SaraNeural">sunset</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> megnyomta. megnyomta,</prosody><prosody rate="-35%"> megnyomta.</prosody></voice>  <voice name="en-US-SaraNeural">pressed</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> tekintet. tekintet,</prosody><prosody rate="-35%"> tekintet.</prosody></voice>  <voice name="en-US-SaraNeural">gaze</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> álmodozott. álmodozott,</prosody><prosody rate="-35%"> álmodozott.</prosody></voice>  <voice name="en-US-SaraNeural">dreaming</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> barátság. barátság,</prosody><prosody rate="-35%"> barátság.</prosody></voice>  <voice name="en-US-SaraNeural">friendship</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> fontos. fontos,</prosody><prosody rate="-35%"> fontos.</prosody></voice>  <voice name="en-US-SaraNeural">important</voice>
<voice name="hu-HU-NoemiNeural"><prosody rate="0%" pitch="0%"> rájött. rájött,</prosody><prosody rate="-35%"> rájött.</prosody></voice>  <voice name="en-US-SaraNeural">realized</voice>
</speak>
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,734 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.