How to set parameters such as emotion and speech speed for text to speech in REST API?

赵霄飞 赵 0 Reputation points
2023-07-14T03:08:43.47+00:00

User's image

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator
    2023-07-14T08:12:04.3266667+00:00

    @赵霄飞 赵 To express emotion you need to use the element mstts:express-as for example,

    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
           xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN">
        <voice name="zh-CN-XiaomoNeural">
            <mstts:express-as style="sad" styledegree="2">
                快走吧,路上一定要注意安全,早去早回。
            </mstts:express-as>
        </voice>
    </speak>
    
    
    

    For speed, you need to use the prosody rate element. For example, the below snippet is used to change the speaking rate to 30% greater than the default rate.

    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voice name="en-US-JennyNeural">
            <prosody rate="+30.00%">
                Enjoy using text to speech.
            </prosody>
        </voice>
    </speak>
    
    
    

    Please refer to SSML documentation on other features that are supported through SSML. I personally feel using the speech studio to set the required values of speech is useful through audio content creation tool and you can copy the SSML elements from the speech studio after attaining the required output.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.