Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
prosody Element (Microsoft.Speech)
Specifies the pitch, contour, range, rate, duration, and volume for speaking the contained text.
Syntax
<prosody pitch="value" contour="value" range="value" rate="value" duration="value" volume="value"> </prosody>
Attributes
Attribute |
Description |
---|---|
pitch |
Optional. Indicates the baseline pitch for the contained text. This value may be expressed in one of three ways:
|
contour |
Optional. Represents changes in pitch for speech content as an array of targets at specified time positions in the speech output. Each target is defined by sets of parameter pairs, for example: |
range |
Optional. A value that represents the range of pitch for the contained speech content. This value may be expressed using the same absolute values, relative values, or enumeration values used to describe pitch, see above. |
rate |
Optional. Indicates the speaking rate of the contained text. This value may be expressed in one of two ways:
|
duration |
Optional. A value in seconds or milliseconds for the period of time that should elapse while the speech synthesis (TTS) engine reads the contents of the element. For example 2s or 1800ms. |
volume |
Optional. Indicates the volume level of the speaking voice. This value may be expressed in one of three ways:
|
Note
Standards for well-formed, valid XML require attribute values to be enclosed in double quotation marks. For example, <prosody volume="90"> is a well-formed, valid element, but <prosody volume=90> is not.
Remarks
Because prosodic attribute values can vary over a wide range, the speech recognizer interprets the assigned values as a suggestion of what the actual prosodic values of the selected voice should be. The text-to-speech (TTS) engine limits or substitutes values that are not supported. Examples of unsupported values are a pitch of 1 MHz or a volume of 120.
Note
The speech synthesis engines for the Microsoft Speech Platform do not support the contour, range, or duration attributes at this time. Setting values for these attributes will produce no change in the synthesized speech output.
Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xml:lang="en-US">
<s>
Your order for <prosody pitch="+1st" rate="-10%" volume="90"> 8 books and 1 reading lamp </prosody>
will be shipped tomorrow.
</s>
</speak>