How can I insert a time-break at the end of a line/paragraph when using a multilingual voice in Azure?

MartinGriewel 5 Reputation points
2024-07-11T12:48:25.1233333+00:00

Hello,
I am using the Azure AI text-to-speech web interface.

When I generate an audio from the following text, both Strong-breaks are generated.

[Guy] This is a text. [Strong] This is a text after a break. [Strong]
[Guy] And this is a text in a new line, after a break at the end of the previous line.

When I generate the audio from the same text, but with another voice, only the first Strong-break is generated, the second Strong-break is ignored.

[Florian Multilingual] This is a text. [Strong] This is a text after a break. [Strong]
[Florian Multilingual] And this is a text in a new line, after a break at the end of the previous line.

All Multiligual voices seem to ignore breaks at the end of a line/paragraph.
With all standard voices the breaks work.

How can I insert a break at the end of a line/paragraph when using a multilingual voice?

Thanks in advance
Martin

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,645 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. MartinGriewel 5 Reputation points
    2024-07-15T14:32:30.93+00:00

    I have just found a workaround.

    When I add a space after the time-break at the end of the line, the time-break is recognized also with multilingual voices. The result is fine, but you cannot see in the source text, whether there is a space or not. Therefore, it is not a solution, but only a workaround.
    It would be nice if Microsoft could fix this problem with a real solution.

    1 person found this answer helpful.
    0 comments No comments

  2. Amira Bedhiafi 20,101 Reputation points
    2024-07-11T13:41:55.6366667+00:00

    I am not expert in this matter but here is what I found on some blogs how you need to structure your SSML to insert breaks, including those at the end of a line or paragraph:

    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
      <voice name="FlorianMultilingual">
        This is a text. 
        <break strength="strong"/> 
        This is a text after a break.
        <break strength="strong"/>
      </voice>
      <voice name="FlorianMultilingual">
        And this is a text in a new line, after a break at the end of the previous line.
      </voice>
    </speak>
    

    or use time in your break tag :

    time="2000"
    

    https://stackoverflow.com/questions/75869528/how-to-customize-silence-time-between-sentence-groups-in-azure-text-to-speech