Speech Application Dialog Concepts

Article
08/18/2014

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Use Speech Server Developer Tools to design effective voice user interfaces for speech applications. An effective dialog is the key component to a successful interaction between a voice-only application and a user.

A voice-only application interacts with the user entirely without visual cues. The dialog flow must be intuitive and natural enough to simulate two humans conversing. It must also provide a user with enough context and supporting information to understand the next action step at any point in the application.

Dialog Styles

To add speech recognition and confirmation functionality to a voice-only application:

Decide which answers a user can give in response to a particular question. This process typically consists of deciding whether to implement a system-initiative or mixed-initiative dialog style.
Decide which prompts to provide at specific points in the application:
- If you are using a system-initiative dialog style, a specific prompt typically follows a specific answer from a user, making this process fairly straightforward.
- If you are using a mixed-initiative dialog style, any number of prompts can follow an answer from a user. This process involves determining which answers can follow a specific prompt and which prompt to provide, depending on the answers from the user.

System-Initiative Dialog Style

Using the system-initiative style, a sequence of specific questions or prompts guides a user through an application. The application asks the user a question and accepts only an answer to that specific question. This style uses sequential dialog. Each question and answer cycle consists of one question and one answer. System-initiative dialogs are typically simpler to design than those using mixed-initiative dialog, but they limit the amount of flexibility a user has when answering questions.

Mixed-Initiative Dialog Style

Using the mixed-initiative style, a user can answer multiple questions at once. The application can accept an answer in response to a specific question, but it can also accept extra answers that apply to questions the application has not yet asked. This style enables non-sequential dialog. Each question and answer cycle includes one question and one or more answers. Mixed-initiative dialogs are typically more difficult to design than system-initiative dialogs, but they provide users with greater flexibility when answering questions. Mixed-initiative dialogs simulate human interaction more closely than system-initiative dialogs.

Confirming and Rejecting Answers

Consider three basic strategies when deciding how to confirm, correct, or reject user's responses:

Explicit confirmation (EC) strategy. EC is the most basic form of confirmation. Of the three styles of confirmation, EC takes the most user time because it introduces an extra prompt to explicitly confirm information that the user has previously provided. Use EC for situations where the cost of a misunderstanding is high.
```
Application: "Where are you flying from?" 
User: "Seattle" 
Application: "Did you say Seattle?" 
User: "Yes" 
```
The application has now confirmed the user's desired departure location, and can continue asking for further information about the user's flight.
Implicit confirmation (IC) strategy. Using IC, the confirmation question combines with the next information retrieval question to form a single prompt. This strategy uses fewer prompts than explicit confirmation. Consider a flight booking scenario where the application obtains the city that the user is flying from, followed by the date. IC results in a dialog interaction of the following form.
```
Application: "Where are you flying from?" 
User: "Seattle" 
Application: "Flying from Seattle. On what date?"
```
If the user answers this question with a date, then the answer implies that Seattle is correct, thereby confirming the selection of Seattle as the departure city. The grammar for IC interaction is subtly different from the grammar for EC. The grammar for IC combines acceptance or denial of the previous prompt (in this case, the departure city) with supply of information for the next prompt.
```
Application: "Flying from Seattle. On what date?" 
User: "No" 
Application: "Where are you flying from?" 
User: "Vancouver" 
Application: "Flying from Boston. On what date?" 
User: "No, Vancouver" 
Application: "Flying from Vancouver. On what date?"
```
Short time-out confirmation (STC) strategy. Using STC, the confirmation question is an echo of the SemanticItem, either as a statement or a question, and silence is interpreted as acceptance of the confirmation.
For example, if the normal silence time-out is three seconds, then the time-out for STC should be one second. The application does not expect a response. Instead, the application makes a statement of its understanding to the user and invites a correction. Assuming that the system is correct most of the time, the dialog flow moves quickly and smoothly in the STC method.
```
Application: "Which city do you want to fly to?" 
User: "Seattle." 
Application: "Seattle." 
User: "" 
Application: "At what time do you want to fly?"
```
Because the user did not correct the system when it repeated the value, the application accepts the value "Seattle."
The grammar for the STC interaction is identical to the grammar for the EC interaction. At a minimum, the grammar should support Yes or No on its own, or Yes or No followed by a corrected value, such as "No, Seattle."

Share via

Speech Application Dialog Concepts

Dialog Styles

System-Initiative Dialog Style

Mixed-Initiative Dialog Style

Confirming and Rejecting Answers

See Also

Other Resources

Additional resources