Share via

Azure Cognitive Services TTS fails to generate audio for certain Unicode characters (e.g., em dash, arrow symbols)

Pawel Pelka 20 Reputation points
2026-04-23T18:55:50.31+00:00

I'm using Azure Cognitive Services Text-to-Speech (via SSML) to generate audio from text content. Some Unicode characters cause the TTS engine to fail or produce no audio output, while others in the same Unicode block work correctly.

Characters that fail:

• — (U+2014, Em Dash)

• ← (U+2190, Leftwards Arrow)

• ↑ (U+2191, Upwards Arrow)

• → (U+2192, Rightwards Arrow)

• ↓ (U+2193, Downwards Arrow)

Characters that work:

• ↕ (U+2195, Up Down Arrow)

The failure is silent — the API does not return an error, but the resulting audio skips or truncates at the point where these characters appear. This behavior is inconsistent across characters within the same Unicode block (Arrows, U+2190–U+21FF).

Expected behavior: Either the character is spoken (or skipped gracefully) and audio generation continues.

Actual behavior: Audio generation fails or is truncated at the offending character.

Questions:

  1. Is there a known list of unsupported Unicode characters for TTS/SSML input?
  2. Is the recommended approach to strip or replace these characters before passing text to the API?
  3. Are there voice/locale-specific differences in Unicode support?
Azure Speech in Foundry Tools

Answer accepted by question author

Karnam Venkata Rajeswari 3,575 Reputation points Microsoft External Staff Moderator
2026-04-23T20:13:24.05+00:00

Hello @Pawel Pelka ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

peech synthesis uses Speech Synthesis Markup Language (SSML), which is an XML-based format that controls pronunciation, pacing, and structure of the generated audio. During this process, input text undergoes normalization and tokenization before being converted into speech. Some Unicode symbols do not consistently map to speech tokens, which can result in silent skipping or truncation without explicit API errors.

As asked -

  1. There is currently no officially published list of unsupported Unicode characters for text-to-speech or SSML. Symbol support depends on internal normalization and speech mapping, and may vary across different characters even within the same Unicode range.
  2. The following are the practices ensure stable behavior and prevent silent truncation. Preprocessing of input text is highly recommended for production scenarios. Converting symbols into speech-friendly text ensures consistent and complete audio output. Suggested approaches include:
    • Replace symbolic characters with descriptive text
    • ← → “left arrow”
    • → → “right arrow”
    • ↑ → “up arrow”
    • ↓ → “down arrow”
    • Replacing punctuation such as em dash with speech-compatible alternatives -Using comma or sentence separation or inserting controlled pauses using SSML
    • Using SSML substitution when meaning must be preserved For example: <sub alias="left arrow">←</sub>
    • Please prefer explicit pauses using SSML <break> instead of relying on punctuation for timing control.SSML supports inserting pauses directly in the speech sequence
  3. Voice and locale differences can influence how text is normalized and spoken. SSML supports multiple voices, languages, and speech configurations, and each may process input text slightly differently. While some characters may work in specific combinations, the behavior is not deterministic enough to rely on for consistent output. Unsupported or weakly supported symbols may not always be gracefully skipped, and in some cases can disrupt the synthesis stream. Since such conditions do not always return explicit errors, they appear as audio gaps or truncation. To ensure reliable results, please consider the following approaches.
    • Normalize input text (Unicode normalization before synthesis)
    • Replace or map symbolic Unicode characters to natural language equivalents
    • Use SSML elements such as <break> for pauses and <sub> for substitutions
    • Avoid passing raw symbolic characters directly to speech synthesis pipelines

The following references might be helpful , please check them out

Thank you

Was this answer helpful?

0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.