TTS output is different for different SSML

MH 1 Reputation point
2022-11-23T04:28:14.3+00:00

I am generating samples to test out the TTS SSML prosody functionality for ja-JP-NanamiNeural.

Expecting rate="-50%" to have slower speech audio, however, the audio I am getting at the text in the prosody tag to be faster than expected (Sample 1). Similar is expected for Sample 2, where speech audio for rate="+50"% is expected to be faster, but slower speech audio was generated.

Samples 4-6 are generating as expected based on the rate (-50% (slower) or +50% (faster))

Would like to check if this output audio is expected for Samples 1 and 2?

Sample 1

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="ja-JP" xmlns:mstts="https://www.w3.org/2001/mstts">  
        <voice name="ja-JP-NanamiNeural">  
            こんにちは世界。<prosody rate="-50%" pitch="0%">これはテス</prosody>ト文 1 です。これはテスト文 2 です。  
        </voice>  
    </speak>  

Sample 2

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="ja-JP" xmlns:mstts="https://www.w3.org/2001/mstts">  
        <voice name="ja-JP-NanamiNeural">  
            こんにちは世界。<prosody rate="+50%" pitch="0%">これはテス</prosody>ト文 1 です。これはテスト文 2 です。  
        </voice>  
    </speak>  
  

Sample 3

  <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="ja-JP" xmlns:mstts="https://www.w3.org/2001/mstts">  
        <voice name="ja-JP-NanamiNeural">  
            こんにちは世界。<prosody rate="-50%" pitch="0%">これはテスト文 1 です。</prosody>これはテスト文 2 です。  
        </voice>  
    </speak>  

Sample 4

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="ja-JP" xmlns:mstts="https://www.w3.org/2001/mstts">  
        <voice name="ja-JP-NanamiNeural">  
            こんにちは世界。<prosody rate="+50%" pitch="0%">これはテスト文 1 です。</prosody>これはテスト文 2 です。  
        </voice>  
    </speak>  

Sample 5

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="ja-JP" xmlns:mstts="https://www.w3.org/2001/mstts">  
  <voice name="ja-JP-NanamiNeural">  
    <prosody rate="-50%" pitch="0%">  
        こんにちは世界。これはテスト文 1 です。これはテスト文 2 です。  
    </prosody>  
  </voice>  
</speak>  

Sample 6

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="ja-JP" xmlns:mstts="https://www.w3.org/2001/mstts">  
  <voice name="ja-JP-NanamiNeural">  
    <prosody rate="+50%" pitch="0%">  
        こんにちは世界。これはテスト文 1 です。これはテスト文 2 です。  
    </prosody>  
  </voice>  
</speak>  
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,393 questions
{count} votes