How to add break times is Azure SSML

Jakub Chudiak 0 Reputation points
2024-07-31T10:39:26.1433333+00:00

how can i add breaks in ssml without deticating voice tag for each like this beacuse limit is 50

<?xml version="1.0" encoding="UTF-8"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" version="1.0" xml:lang="en-EN">
    <voice name="en-GB-RyanNeural"><break time="1230ms"/></voice>
	<voice name="en-GB-RyanNeural"><mstts:audioduration value="2440ms"/>This is a jammy Margarita.</voice>
    <voice name="en-GB-RyanNeural"><break time="1850ms"/></voice>
	<voice name="en-GB-RyanNeural"><mstts:audioduration value="4600ms"/>Let me show you a little secret ingredient. 2 teaspoons of your favourite jam.</voice>
    <voice name="en-GB-RyanNeural"><break time="4790ms"/></voice>
</speak>


but i also want to keep duration of my text

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,659 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 9,406 Reputation points
    2024-07-31T11:38:20.6333333+00:00

    Hello Jakub Chudiak,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Problem

    I understand that you would like to add break times is Azure SSML and still keep duration of your text.

    Solution

    To manage breaks in SSML without dedicating a separate <voice> tag for each, you can consolidate the text and use the <break> tag within a single <voice> block. This approach allows you to specify pauses while keeping the same voice settings throughout the text.

    <?xml version="1.0" encoding="UTF-8"?>
    <speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" version="1.0" xml:lang="en-EN">
        <voice name="en-GB-RyanNeural">
            <break time="1230ms"/>
            <mstts:audioduration value="2440ms"/>This is a jammy Margarita.
            <break time="1850ms"/>
            <mstts:audioduration value="4600ms"/>Let me show you a little secret ingredient. 2 teaspoons of your favourite jam.
            <break time="4790ms"/>
        </voice>
    </speak>
    

    By using a single <voice> tag, you avoid the limit of 50 <voice> tags and still maintain control over the pacing and pauses in the speech synthesis output. You can check the links for more reading: https://www.w3.org/TR/speech-synthesis11/#S3 and https://www.w3.org/TR/speech-synthesis/#Voice and https://www.w3.org/TR/speech-synthesis/#S3.2.3

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.