Using Prompt Databases

Article
07/05/2006

A prompt is a question or information spoken by a speech application. Typically a prompt is a question, such as "On what date do you wish to depart?" It can also be a greeting, such as "How may I help you?" or provide direction, such as "Press three."

To add prompts to a page, either use the Speech Application Language Tags (SALT) prompt element or add prompts to controls. QA, Command, and Microsoft ASP.NET Application Speech Controls in voice-only applications can include prompts. A prompt added to a control takes one of two forms: an inline prompt, or a prompt function.

The prompt engine renders prompts as text-to-speech (TTS) or as speech that is professionally recorded in a studio environment. Voice-only applications include prompts, but multimodal applications do not. Voice-only applications use prompts as the only method of communication with the user.

Speech quality is important to the impression that an application makes on users. Consider three levels of speech quality delivered by an application.

The best quality consists of professionally-recorded prompts.
Intermediate quality consists of prompts recorded by amateur voice talent in a typical office environment.
Minimal quality consists of prompts spoken in TTS, a synthesized voice.

The prompts database contains recorded prompts (.wav files) and their transcriptions. Use Speech Prompt Editor, a tool included in the Microsoft Speech Application SDK Version 1.1 (SASDK), to record and edit prompts, and to manage prompt databases.

Prompts databases may also contain other audio used in the application. These non-speech entries might include things like silence (used to augment the timing of real prompts), background music, welcome jingles, page transition sounds and beeps. It is good practice to keep all of the non-speech prompt entries in a separate prompts project.

Prompt Engine Overview

Specify prompt text in code, either in the .aspx file using prompt engine markup language, or in a script function. The prompt engine converts written text into recorded audio output. The prompt engine produces synthetic speech by concatenating recorded words and phrases stored in the prompt database. The prompt engine searches the database for recorded segments that can be combined to match the specified text. For any segment, it might find one matching recording, more than one, or none.

When the prompt engine finds multiple matching recordings, it selects the best set of prompts to use. To determine the best set of prompts, the prompt engine determines how natural two speech segments will sound when concatenated. Based on a comparison of the original recording's phonetic context to the new phonetic context, the recordings with more similar contexts are selected. The user must do two things to override the default prompt selection.

Add tags to extractions
Use the prompt engine's peml:withtag XML markup to request specific prompts

If the prompt engine fails to locate any recording for a segment, it uses a TTS engine to synthesize that segment's speech. The resulting prompt is rendered as TTS. The prompt engine does not mix prompts and TTS in a single prompt request unless the user specifically requests a mix using the a TTS XML markup tag. To avoid using TTS except when necessary, use the Prompt Validation tool, in Speech Prompt Editor, to verify prompt coverage in the application.

Creating the Recording Set

Begin creating prompts for the application by determining what prompts are needed. First, design the dialogs for the application, then step through them. What questions does the application need to ask? When scripting prompts, also consider the following issues.

Directed vs. open prompts
Which confirmations the application needs
Which confirmation style the application will use
- Explicit Confirmation (EC)
- Implicit Confirmation (IC)
- Short Time-out Confirmation (STC)
Prompts that the application plays in response to user silence and mumbles
Prompts that the application plays in response to requests like Help and Repeat, please

Eliminating Duplicate Recordings

Reduce the number of prompts by eliminating duplicates and using general rather than specific prompts. For example, use "I did not understand what you said" twice rather than the two prompts "I did not understand the phone number" and "I did not understand the area code."

Look for duplicate phrases and eliminate them. Use the Extractions window in Speech Prompt Editor to define prompt components that can be recombined to assemble the full variety of prompts that the application requires.

For example, in a pizza-ordering scenario, one confirmation prompt might say "Did you say a large mushroom pizza?" The constant elements in this phrase are the words "Did you say a" and "pizza." The words for size ("small," "medium," and "large") are variables that the prompt engine inserts when appropriate. The words for variety ("cheese," "plain," "mushroom," and "pepperoni") are also variables. Record a basic phrase, such as "Did you say a large mushroom pizza?", then record all the necessary variables to build all the variants of the phrase ("small," "medium," "pepperoni," and other options). Use script functions, specified by the PromptSelectFunction property, to combine extractions dynamically at run time into full sentences. This process eliminates the need to record every full phrase.

Recording High Quality Prompts

Spoken output is very important for a voice-only application, because it is the only way to communicate to the user. Make the output of the application clear, intelligible, efficient, unambiguous, and as pleasant as possible.

A number of factors make this goal somewhat difficult to attain. First, many applications, like an e-mail reader, handle data that is not known at design time, and therefore the application must use TTS. For the remaining output that is known at design time, it is often impractical or impossible to script and record every possible variation. Think about a spoken e-mail header as an example: "Message from Joe entitled Hello arrived at ten twenty AM on December 14." There are some recorded sections, such as the name of the sender and the subject line, that need to interface with TTS. The latter part of the sentence provides a challenge, because recording every time of day along with every date is a sizeable task.

Second, consider the problem of producing the recorded prompt "You can say Next, Previous, or Repeat." At design time, the author knows that more commands may be added in the future, or that this particular prompt only uses a subset of the possible commands. The system must be able to read any number of the commands, the commands may be spoken in any order, and the entire prompt must sound natural.

One option is to record each of the elements of the prompt in isolation, taking care to eliminate the coarticulatory effects of the other words. For example, record the following units.

You can say
Next
Repeat
or
Previous

However, this recording results in a prompt that sounds very stilted. The following two techniques can be used to improve the naturalness of the final prompt.

Insert the word "Patrick" in the transcription between each of the elements to be extracted:

Patrick, Next, Patrick, Repeat, Patrick, or Patrick Previous.
Use the word "Patrick" because it both starts and ends in an unvoiced plosive sound (P and K sounds). This property reduces the coarticulatory effects that speaking one word has on the sound of the next.
The second technique is to make separate recordings, in which the words "Next," "Previous," and "Repeat" occur in each of the three positions in the sentence.
- Next, Previous or Repeat.
- Repeat, Next, or Previous.
- Previous, Next or Repeat.

Words often sound differently when spoken in different places within a sentence. Recording the words in several places within a sentence allows the application to choose from several examples, using tags to determine which recording is preferable. Speech Prompt Editor allows the developer to select an extraction that bears a particular tag. The prompt engine then combines the extraction with other extractions at run time to build a complete sentence.

Using Wave Editor to Make Final Adjustments

Wave Editor is a key tool in improving prompt quality. The prompts database stores prompts as .wav files. Wave Editor performs three functions.

It displays a graphical view of .wav file data.
It allows developers to edit the word boundaries within a .wav file.
It allows developers to cut, copy, and paste wave segments both between and within .wav files.

The art of prompt recording involves using techniques like those in the following list to provide .wav file sources that are easy to edit.

Use neutral intonations, so that words sound as natural as possible when spliced together in different sequences.
Make judicious use of pauses. Adding to the buffer between words makes defining splicing boundaries easier, but adding too long a pause makes the speech sound stilted when combined with other words.
Use a consistent voice throughout the recordings for an application. Using comparable volumes throughout the recording set allows the prompt engine to splice words together more naturally. Use volume normalization as a technical means to supplement work done in the recording studio.
Speech Prompt Editor determines boundaries between words programmatically. When defining extractions within a .wav file, the application uses these word boundaries to determine the start and end points for the extraction. As a final step, verify and adjust word boundaries in Wave Editor to ensure that breaks occur at the actual word boundaries.

Selecting Prompts at Run Time

QA, Command, and Application Speech Controls in voice-only applications can include prompts. Add a prompt to a control in one of two forms.

As an inline prompt
- or -
As a prompt function

An inline prompt is a piece of static text that the prompt engine plays when the application activates a control. A good example is a WelcomeQA control that plays a cheery, professional greeting whenever a user opens the page. Inline prompts can be entered in any of three ways.

In the Properties window
- or -
On the Voice-Only Prompt page
- or -
Using the Prompt tag in HTML view

The following code example plays a greeting rendered as TTS.

<speech:QA id="qa1" PlayOnce=True runat="server">
<Prompt inlinePrompt="Welcome to the zoo!" />
</speech:QA>

Only one inline prompt exists per control. If a control has an inline prompt, that prompt is the only prompt the control plays. An application cannot change an inline prompt at run time. Although inline prompts are simple and easy to add, they lack flexibility.

A prompt function however, dynamically generates a prompt at run time. Use Prompt Function Editor to add prompt functions to QA, Command, or Application Speech Control.

There are three general purposes for prompt functions. Find the examples cited for each of the following descriptions of the three purposes in the Samples project of the Samples.sln file.

Use prompt functions to provide confirmation prompts that change according to customer behaviors. For examples of this type of prompt, see:
- Prompts.pf in the UserInputAndResults directory
- PromptHelper.js in the TableNavigation directory
Use prompt functions to provide error or help prompts. For examples of these uses, see:
- The Prompts.pf file in the SilenceAndLowConf directory
- The PastDate_prompt_questionDate function in the Prompts.pf file located in the DatesAndValidation directory
- The SayInvalidDate_prompt function in the Prompts.pf file located in the DatesAndValidation directory.
Use data normalization functions to control how data is spoken in prompts. Typically, data normalization translates digits and symbols into text, which the prompt engine then matches to recorded speech in the prompts database. For an example of this use, see:
- The PastDate_prompt_acknowledge function in the Prompts.pf file located in the DatesAndValidation directory.

Prompt select functions provide a convenient mechanism for defining all the possible prompts that each QA control can output. Additionally, because prompt functions have a standard interface, Prompt Validation can test these functions for coverage and completeness without actually executing the application. Prompt Function Editor is a powerful environment for prompt design and debugging.

Verifying Prompt Coverage

Verify prompt coverage before deploying an application. Coverage is a measure of the proportion of prompts that an application can play. Prompts must be present in the database, both as extractions and as sound recordings. For example, an application produces the prompt "Confirm that you have ordered <some number> pizzas." The number in this sentence can be any non-negative integer. The application might also prompt "Confirm that you have ordered no pizzas." If the word "no" is not recorded or marked as an extraction, the application has incomplete coverage.

A professional-sounding application requires 100 percent prompt coverage. Use the Prompt Validation tool, in Speech Prompt Editor, to verify prompt coverage.

Share via