ssml prosody tag

Question

ssml prosody tag

Lucia Pozzan 1

According to ssml 1.1 (https://www.w3.org/TR/speech-synthesis11/), the prosody rate tag should only include non-negative numbers:

"rate: a change in the speaking rate for the contained text. Legal values are: a non-negative percentage or "x-slow", "slow", "medium", "fast", "x-fast", or "default". Labels "x-slow" through "x-fast" represent a sequence of monotonically non-decreasing speaking rates. When the value is a non-negative percentage it acts as a multiplier of the default rate. For example, a value of 100% means no change in speaking rate, a value of 200% means a speaking rate twice the default rate, and a value of 50% means a speaking rate of half the default rate. The default rate for a voice depends on the language and dialect and on the personality of the voice. The default rate for a voice should be such that it is experienced as a normal speaking rate for the voice when reading aloud text. Since voices are processor-specific, the default rate will be as well."

However, this does not seem to be the case when specifying prosody rate in Microsoft TTS, as <prosody rate="30.00%"> plays at a higher speed than 100% and seems to be interpreted as "<prosody rate="+30.00%">.

Is this a bug or a conscious decision to depart from SSML standards? Is there a way to force the tag to be interpreted as intended?

romungi-MSFT 48,916 Reputation points Microsoft Employee Moderator

2021-07-08T07:04:33.18+00:00

@Lucia Pozzan As per the design the rate is a relative value with Azure TTS. As per the documentation:

I did not find a force it to use the standard as mentioned above though.

Your answer

romungi-MSFT 48,916 Reputation points Microsoft Employee Moderator

2021-07-08T07:04:33.18+00:00

@Lucia Pozzan As per the design the rate is a relative value with Azure TTS. As per the documentation:

I did not find a force it to use the standard as mentioned above though.

Share via

ssml prosody tag

Your answer