Partager via


Speech Output

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

The Speech Engine Services (SES) engines interpret Speech Synthesis Markup Language (SSML) from a voice response application to produce audio that the user can play. SES uses two engines to produce audio output: a prompt engine and a text-to-speech (TTS) synthesis engine.

Prompt Engine

The SES prompt engine compares SSML to a database of prerecorded .wav files, or prompts. The database is a component of the speech application.

To generate speech, the SES prompt engine searches the prompt database for a match to the text it receives from the application. It can concatenate several prompts to produce the complete output. If the prompt engine cannot match any part of the text to a prerecorded prompt, it sends that word or phrase to the TTS synthesis engine for processing.

Note

The prompt engine provided with Speech Server only supports SSML; it does not support Speech API (SAPI) TTS markup. The SSML supported by Speech Server and implemented in the Speech Server Developer Tools?? is based on the World Wide Web Consortium SSML specification Version 1.0 of September 7, 2004.

TTS Synthesis Engine

RealSpeak Telecom, the TTS synthesis engine used by SES, is provided by Nuance Communications, Inc.

The SES prompt engine passes text phrases not found in the prompt database to the RealSpeak prompt engine. The RealSpeak engine uses speech-synthesis techniques to approximate the audio stream for a human voice reading the source text. For English (United States), Jill is the female voice (default) and Tom is the male voice. When additional language packs are installed, different voices are provided, as shown in the following table.

Language Male Voice Female Voice

English (United States)

Tom

Jill

English (United Kingdom)

Daniel

Emily

French (Canada)

Felix

Julie

German (Germany)

Not available

Steffi

Spanish (United States)

Javier

Paulina

By default, the TTS volume (amplitude) is set to 30 percent of the maximum volume but can be adjusted in the RealSpeak engine. For example, if the volume is set too high, you might experience a speech echo that produces an unintentional bargein, or interruption, in a system prompt.

Note

If you modify the TTS volume, you might need to change the volume in your prompt databases to match. For more information, see Prompt Databases in Web-based Voice Response Applications.

The RealSpeak prompt engine and RealSpeak User's Guide are installed by default during??Speech Server??setup.

See Also

Concepts

Speech Recognition