Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Speech Synthesis Markup Language Reference (Microsoft.Speech)

Speech Synthesis Markup Language (SSML) is an XML-based markup language that application developers use to control various characteristics of synthetic speech (text-to-speech, or TTS) output including voice, pitch, rate, volume, pronunciation, and other characteristics.

The Microsoft implementation of SSML is based on World Wide Web Consortium Speech Synthesis Markup Language (SSML) Version 1.0.

All SSML elements belong to the ssml namespace. The following elements are implemented in the Microsoft Speech Platform SDK 11.

SSML Element

Description

Usage

Attributes

audio

Supports the insertion of recorded audio files.

Optional

src

break

An empty element used to control the prosodic boundaries between words.

Optional

strength, time

emphasis

Increases the level of stress with which the contained text is spoken.

Optional

level

lexicon

Specifies a lexicon document that contains the pronunciations for the content of the document.

Optional

uri, type

mark

Designates a specific reference point in the text sequence. This element can also be used to mark an output audio stream for asynchronous notification.

Optional

name

p and s

Denote the paragraph and sentence structure of the document.

Optional

xml:lang

phoneme

Indicates the phonetic pronunciation for the contained text. Overrides the pronunciations in the lexicon, if one is specified.

Optional

ph, alphabet

prosody

Controls the pitch, rate, and volume of the speech output.

Optional

pitch, contour, range, rate, duration, volume

say-as

Indicates the type of text contained in the element (such as acronym, number, and date).

Optional

interpret-as, format, detail

speak

The required root element for all SSML documents.

Required

version, xmlns, xml:lang

sub

Specifies a string of text that should be pronounced in place of the text contained in the element.

Optional

alias

voice

Specifies a voice and its attributes, to be used for synthesized speech, often used to change from one voice to another.

Optional

xml:lang, gender, age, variant, name

In This Section