Thai text has pause before IPA phoneme, Vietnamese doesn't - why?

i'm MariOhn 81 Reputation points
2024-10-30T06:45:12.82+00:00

I found different behaviors when using IPA phonemes in text-to-speech:

Vietnamese:

"không phải [May] xin lỗi"

  • Flows naturally without pauses

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US"><voice name="en-US-AvaMultilingualNeural">không phải <phoneme alphabet="ipa" ph="meɪ">May</phoneme> xin lỗi </voice></speak>

Thai:

"ฉันไม่ใช่ [May] ขอโทษ"

  • Has pause before IPA only
  • Continues smoothly after

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US"><voice name="en-US-AvaMultilingualNeural">ฉันไม่ใช่ <phoneme alphabet="ipa" ph="meɪ">May</phoneme> ขอโทษ</voice></speak>

Is this expected behavior? Any way to remove the pre-IPA pause in Thai?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,765 questions
0 comments No comments
{count} votes

Accepted answer
  1. RevelinoB 3,345 Reputation points
    2024-10-30T08:38:19.83+00:00

    Hi I'm MariOhn,

    The difference in behavior you’re observing with text-to-speech (TTS) using IPA phonemes in Vietnamese and Thai may stem from how the TTS system handles language-specific prosody, especially with languages that have tonal or complex phonetic structures. We’ve seen this at our customers when using Azure Cognitive Services and identified a few patterns:

    Vietnamese: When using Vietnamese, TTS systems often handle phoneme insertions smoothly due to the language’s relatively consistent syllable structure and the absence of certain complex prosodic markers that trigger pauses. Vietnamese TTS may also be optimized for flowing around phonetic tags, given how often phonemes are intermixed with native phrases in TTS applications.

    Thai: Thai, however, has more intricate tonal rules and phoneme spacing, which may lead to the insertion of pauses around phonemes. In some TTS implementations, inserting IPA phonemes can trigger a slight pause due to the specific handling of tonal adjustments around foreign phonemes, like those marked in IPA.

    Solutions: To attempt to minimize the pause before the IPA in Thai, try:

    • Switching voices: Some TTS voices handle IPA tags differently, especially when working with non-English phrases.
    • Adjusting speed and pitch: This can sometimes encourage smoother blending across IPA tags.
    • Alternative phonetic input: Using alternative phonetic spellings that approximate the sound without IPA can occasionally produce smoother output in languages sensitive to pauses.

    These variations in handling may not be entirely avoidable without modifying the TTS engine itself, as they are often based on language-specific models in the backend processing of multilingual TTS systems. When using Azure Cognitive Services, where speech synthesis and TTS are core features, these solutions can help improve the experience in multilingual applications.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.