Share via

Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks

2025-12-22T12:07:23.8833333+00:00

Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks

Previously, when using Azure Speech Studio (Text to Speech) with plain text input,

each text block separated by blank lines was treated as an independent paragraph

and generated distinct audio segments.

Currently, the same input is being interpreted as a single continuous text,

even when blocks are clearly separated by line breaks or blank lines.

As a result, different text blocks are being merged or interpolated into one audio output.

This behavior change is impacting existing workflows that relied on block-based

text input without requiring SSML.

Questions:

  • Was this behavior intentionally changed?
  • Is there any configuration or option to restore the previous behavior?
  • Is SSML now mandatory to define paragraph boundaries?
  • Is this a known issue or regression?

Expected behavior:

Each text block separated by blank lines should be treated as a paragraph,

without requiring manual SSML wrapping.

Thank you.

Azure Speech in Foundry Tools
0 comments No comments

3 answers

Sort by: Most helpful
  1. Anonymous
    2025-12-25T16:58:42.6433333+00:00

    Hi Brasil Treinamentos Atendente 02

    Azure Speech Studio previously treated each block of text separated by blank lines as independent paragraphs. This meant each block produced its own audio segment without requiring SSML. This behavior changed recently: Speech Studio now merges all blocks into a single continuous paragraph, even when blank lines exist. This is confirmed in the Microsoft Q&A thread describing the exact same regression.

    There is no official Microsoft documentation stating that this was an intentional product update. The Q&A post shows that the community is treating it as an unexpected change. There is no announcement, no release note, and no configuration flag indicating that paragraph handling was redesigned. Therefore, the current evidence points to an unintentional behavioral change (likely a regression)

    blank lines are no longer honored as paragraph breaks, the only guaranteed way to force separation is by using explicit SSML tags, such as:

    • <p>…</p> for paragraphs
    • <break time="Xms"/> for pauses

    The thread confirms SSML is now required to ensure paragraph‑level separation when using Speech Studio.

    Azure Speech Studio’s Text‑to‑Speech engine no longer treats blank lines as paragraph separators, causing previously working multi‑block scripts to merge into one continuous audio output. Microsoft has not documented this as an intentional update, no setting exists to restore prior behavior, and SSML is now required to reliably define paragraphs. Based on the verified Q&A report, this is most consistent with an unintentional regression rather than a new feature design

    References

    Azure Text to Speech https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech

    SSML support in Azure Speech https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    Was this answer helpful?

    1 person found this answer helpful.

  2. Sina Salam 29,106 Reputation points Volunteer Moderator
    2025-12-26T10:28:52.16+00:00

    Hello Brasil Treinamentos Atendente 02 ,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks.

    Yes, Azure Speech Studio’s plain text processor does not interpret blank lines as paragraph separators anymore, and there are no documented options to revert to the previous blank-line behavior.

    To ensure each paragraph is treated separately in audio, convert plain text into SSML (Speech Synthesis Markup Language) and wrap text appropriately:

    1. Use SSML Tags for Paragraphs:
         <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
           <p>First paragraph text here</p>
           <p>Second paragraph text here</p>
           <p>Third paragraph text here</p>
         </speak>
      
    2. Though this is optional, you can add explicit pauses using xml: <break time="500ms"/> to pause between sections. SSML gives precise control over structure and timing as it's stated here: https://docs.azure.cn/en-us/ai-services/speech-service/how-to-speech-synthesis

    In addition to the above, you have the following implementation options

    A. In Azure Speech Studio UI, paste SSML directly into the input editor.

    B. If you using Speech SDK, use APIs like SpeakSsmlAsync() instead of plain text input.

    C. And if with REST API, set Content-Type: application/ssml+xml and send SSML payload.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    Was this answer helpful?


  3. Gowtham CP 7,960 Reputation points Volunteer Moderator
    2025-12-23T08:42:48.7733333+00:00

    Hi Brasil Treinamentos Atendente 02 ,

    Yes — this change is expected with the current behavior of Azure Speech Studio Text to Speech.

    Plain text input is now treated as unstructured text, and blank lines or line breaks are no longer guaranteed to be interpreted as paragraph boundaries. Because of this, multiple text blocks can be synthesized as a single continuous audio output.

    There is no configuration or setting in Speech Studio to restore the previous paragraph-based behavior for plain text. To reliably control pauses and paragraph separation, SSML is now required using tags such as <p> or <s>.

    This is not a known regression; it reflects a shift toward SSML as the supported and deterministic way to define structure in Text to Speech.

    References

    Azure Text to Speech https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech

    SSML support in Azure Speech https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup

    I hope this helps. If the answer is useful, please accept it to close the thread.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.