How to set parameters such as emotion and speech speed for text to speech in REST API?

Question

How to set parameters such as emotion and speech speed for text to speech in REST API?

赵霄飞赵 0

User's image

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-18T04:45:42.6533333+00:00

@赵霄飞赵 Did the below response help answer your question?

1 answer

Your answer

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-18T04:45:42.6533333+00:00

@赵霄飞赵 Did the below response help answer your question?

Answer 1

@赵霄飞赵 To express emotion you need to use the element mstts:express-as for example,

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN">
    <voice name="zh-CN-XiaomoNeural">
        <mstts:express-as style="sad" styledegree="2">
            快走吧，路上一定要注意安全，早去早回。
        </mstts:express-as>
    </voice>
</speak>

For speed, you need to use the prosody rate element. For example, the below snippet is used to change the speaking rate to 30% greater than the default rate.

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="en-US-JennyNeural">
        <prosody rate="+30.00%">
            Enjoy using text to speech.
        </prosody>
    </voice>
</speak>

Please refer to SSML documentation on other features that are supported through SSML. I personally feel using the speech studio to set the required values of speech is useful through audio content creation tool and you can copy the SSML elements from the speech studio after attaining the required output.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to set parameters such as emotion and speech speed for text to speech in REST API?

1 answer

Your answer