Text to Speech: imprecise Break times

mrx 21 Reputation points
2022-03-17T13:38:56.147+00:00

Asking for 20second of pause in text->speech gives only 5s and 10s for this code:
I would expect 20s!

Tested on website and over python API
https://azure.microsoft.com/en-gb/services/cognitive-services/text-to-speech/#features

  <speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
    <voice name="en-US-SaraNeural">
      <mstts:express-as style="cheerful">
        <prosody pitch="0%" rate="1" volume="100">

        One. 
        <mstts:silence  type="Sentenceboundary"  value="20s"/>
        Two.

        One. 
        <break time="20s"/>
        Two.

        </prosody>
      </mstts:express-as>
    </voice>
  </speak>
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
407 questions
{count} votes

Accepted answer
  1. GiftA-MSFT 11,166 Reputation points
    2022-03-17T15:14:29.863+00:00

    Hi, according to the documentation, the time value should be set less than 5,000 ms. Hope this helps.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. mrx 21 Reputation points
    2022-03-18T09:18:14.36+00:00

    As you can see in my 2nd comment, 4x5000ms != 20sec but 10,71sec, which makes it useless.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.