Hi, according to the documentation, the time value should be set less than 5,000 ms. Hope this helps.
Text to Speech: imprecise Break times
mrx
21
Reputation points
Asking for 20second of pause in text->speech gives only 5s and 10s for this code:
I would expect 20s!
Tested on website and over python API
https://azure.microsoft.com/en-gb/services/cognitive-services/text-to-speech/#features
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="en-US-SaraNeural">
<mstts:express-as style="cheerful">
<prosody pitch="0%" rate="1" volume="100">
One.
<mstts:silence type="Sentenceboundary" value="20s"/>
Two.
One.
<break time="20s"/>
Two.
</prosody>
</mstts:express-as>
</voice>
</speak>
Accepted answer
1 additional answer
Sort by: Most helpful
-
mrx 21 Reputation points
2022-03-18T09:18:14.36+00:00 As you can see in my 2nd comment, 4x5000ms != 20sec but 10,71sec, which makes it useless.