Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks

Question

Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks

Brasil Treinamentos Atendente 02 5

Previously, when using Azure Speech Studio (Text to Speech) with plain text input,

each text block separated by blank lines was treated as an independent paragraph

and generated distinct audio segments.

Currently, the same input is being interpreted as a single continuous text,

even when blocks are clearly separated by line breaks or blank lines.

As a result, different text blocks are being merged or interpolated into one audio output.

This behavior change is impacting existing workflows that relied on block-based

text input without requiring SSML.

Questions:

Was this behavior intentionally changed?
Is there any configuration or option to restore the previous behavior?
Is SSML now mandatory to define paragraph boundaries?
Is this a known issue or regression?

Expected behavior:

Each text block separated by blank lines should be treated as a paragraph,

without requiring manual SSML wrapping.

Thank you.

0 comments

3 answers

Your answer

Answer 1

Anonymous

Hi Brasil Treinamentos Atendente 02

Azure Speech Studio previously treated each block of text separated by blank lines as independent paragraphs. This meant each block produced its own audio segment without requiring SSML. This behavior changed recently: Speech Studio now merges all blocks into a single continuous paragraph, even when blank lines exist. This is confirmed in the Microsoft Q&A thread describing the exact same regression.

There is no official Microsoft documentation stating that this was an intentional product update. The Q&A post shows that the community is treating it as an unexpected change. There is no announcement, no release note, and no configuration flag indicating that paragraph handling was redesigned. Therefore, the current evidence points to an unintentional behavioral change (likely a regression)

blank lines are no longer honored as paragraph breaks, the only guaranteed way to force separation is by using explicit SSML tags, such as:

… for paragraphs
<break time="Xms"/> for pauses

The thread confirms SSML is now required to ensure paragraph‑level separation when using Speech Studio.

Azure Speech Studio’s Text‑to‑Speech engine no longer treats blank lines as paragraph separators, causing previously working multi‑block scripts to merge into one continuous audio output. Microsoft has not documented this as an intentional update, no setting exists to restore prior behavior, and SSML is now required to reliably define paragraphs. Based on the verified Q&A report, this is most consistent with an unintentional regression rather than a new feature design

References

Azure Text to Speech https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech

SSML support in Azure Speech https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup

I Hope this helps. Do let me know if you have any further queries.

If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thank you!

Maria 0 Reputation points

2025-12-30T17:42:21.69+00:00

Adding the does not change the export in the Azure Speech Studio UI for me. Is there something I'm missing?
Anonymous

2025-12-30T19:46:12.7166667+00:00
Hi Maria

Speech Studio has both plain-text flows and SSML-based flows. Microsoft Q&A confirms that plain text is now treated as unstructured and line breaks/blank lines are not guaranteed to be respected, and the deterministic way to control structure is SSML (, <s>, etc.). It also says there’s no UI setting to revert the old behavior. [learn.microsoft.com]

So if you paste ... into a plain text box (or a flow that still interprets it as plain text), it will be treated like literal characters and won’t change export behavior.

In Speech Studio “Audio Content Creation”, you must wrap a valid SSML document

In practice, you need something like this (minimal valid SSML):

XML

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-US-JennyNeural"> First paragraph. Second paragraph. </voice> </speak>

SSML structure examples explicitly show  and <s> as valid elements inside <voice> within <speak>. And the Audio Content Creation tool is explicitly SSML-based (that’s the whole premise of the tool)

Your earlier symptom was audio getting cut/split around ~12–19 seconds. That’s a different problem than paragraph boundary recognition.

There is a separate Q&A report where Speech Studio export started splitting longer outputs into more files and users asked how to turn it off. That aligns with your “cut off / split” behavior. In other words:

 helps with structure / pauses / boundaries

but it may not override a new export segmentation limit in the UI export pipeline (the thing causing ~12–19s chunks)

So even with correct SSML, Speech Studio may still split the exported audio into multiple files if the UI export path enforces chunking.

do right now (fast, concrete steps)

Make sure you are in SSML mode (not plain text). In Audio Content Creation, use SSML input (or paste SSML as the entire document as above).

Validate SSML is actually being parsed:

If the UI has “Download SSML”, click it and ensure it still contains your  tags (means the editor accepted SSML). [speech.microsoft.com]

Test a controlled SSML pause to prove SSML is active:

Add a visible pause: <break time="1500ms"/> between two sentences.

If preview/export still has no pause, you’re still not in SSML parsing mode or the export pipeline is overriding.

If your main pain is split/cut exports:

Try exporting via the whole file option (your workaround).

Or split the script yourself into multiple shorter SSML files (if the UI enforces per-clip duration). The docs even recommend splitting long scripts (20,000 character limit) in this tool.

Answer 2

Hello Brasil Treinamentos Atendente 02 ,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks.

Yes, Azure Speech Studio’s plain text processor does not interpret blank lines as paragraph separators anymore, and there are no documented options to revert to the previous blank-line behavior.

To ensure each paragraph is treated separately in audio, convert plain text into SSML (Speech Synthesis Markup Language) and wrap text appropriately:

Use SSML Tags for Paragraphs:

   <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
     <p>First paragraph text here</p>
     <p>Second paragraph text here</p>
     <p>Third paragraph text here</p>
   </speak>

Though this is optional, you can add explicit pauses using xml: <break time="500ms"/> to pause between sections. SSML gives precise control over structure and timing as it's stated here: https://docs.azure.cn/en-us/ai-services/speech-service/how-to-speech-synthesis

In addition to the above, you have the following implementation options

A. In Azure Speech Studio UI, paste SSML directly into the input editor.

B. If you using Speech SDK, use APIs like SpeakSsmlAsync() instead of plain text input.

C. And if with REST API, set Content-Type: application/ssml+xml and send SSML payload.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Maria 0 Reputation points

2025-12-30T17:44:11.1366667+00:00

Adding the does not change the export in the Azure Speech Studio UI for me. Is there something I'm missing?

Answer 3

Hi Brasil Treinamentos Atendente 02 ,

Yes — this change is expected with the current behavior of Azure Speech Studio Text to Speech.

Plain text input is now treated as unstructured text, and blank lines or line breaks are no longer guaranteed to be interpreted as paragraph boundaries. Because of this, multiple text blocks can be synthesized as a single continuous audio output.

There is no configuration or setting in Speech Studio to restore the previous paragraph-based behavior for plain text. To reliably control pauses and paragraph separation, SSML is now required using tags such as  or <s>.

This is not a known regression; it reflects a shift toward SSML as the supported and deterministic way to define structure in Text to Speech.

References

Azure Text to Speech https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech

SSML support in Azure Speech https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup

I hope this helps. If the answer is useful, please accept it to close the thread.

Maria 0 Reputation points

2025-12-30T17:45:20.58+00:00

Adding the does not change the export in the Azure Speech Studio UI for me. Is there something I'm missing?

Share via

Azure Speech Studio – Text to Speech behavior change regarding paragraph blocks

3 answers

Your answer