Freigeben über


Types of Systems

  Microsoft Speech Technologies Homepage

There are two broad categories of types of speech systems, voice-only and multimodal. The type of system in development affects the choices for speech system design.

Voice-Only Applications

Voice-only applications are often referred to as telephony applications. The user listens to application prompts and responds verbally or with keypresses. Types of voice-only applications are described in the following sections.

Dual Tone Multi-Frequency

Dual tone multi-frequency (DTMF) interfaces are interactive telephone systems that use recorded prompts to direct the user to select an option from a menu of choices using keypresses on touch-tone telephones.

Interactive Voice Response

Interactive Voice Response (IVR) has historically referred to DTMF applications. Currently this term may also describe speech systems generally.

Conversational or Natural Language

Conversational or Natural Language (NL) systems refer to the application of normal everyday speech to human-to-computer interactions. NL is an alternative to command-based languages. NL also refers to design considerations such as conversational style, implied personality and awareness of basic social rules on the part of the system.

Multimodal Applications

Currently, most multimodal interfaces enhance the traditional graphic user interface (GUI) with speech capabilities. Widespread standards have not yet developed around the design and implementation of multimodal interfaces, but there are a few common approaches in evidence today.

In traditional GUI interactions, the user directs the system by a combination of selection and command. These direct manipulations are effectively simple sentence structures: a verb acting on a direct object. The mouse selection (a highlighted document, for example) defines the direct object of the command sentence, while the menu selection describes the action. For example, by selecting a document and choosing Print from the File Menu, the user tells the computer, "Print this document."

In multimodal systems, the language of interaction can be a combination of two modalities. Speech and mouse inputs combine to form commands that are more complex. For example, by selecting a document and saying "Print two copies of THIS" simultaneously, the user has collapsed several direct manipulations into a single click and utterance.

Tap-and-Talk

Tap-and-talk commonly refers to speech-enabled applications that enable users to click on parts of the interface, such as a drop-down list or text entry field, and use speech as an alternative mode of input. This interaction style is particularly useful for handheld devices that may have limited screen size and limited text entry capabilities.

Hands-Free

In settings where users must have their hands free for other tasks, multimodal systems are proving to be beneficial. A number of companies in the health care, service and manufacturing sectors have successfully deployed systems that feature speech input and graphic output.

Other Variations or System Dimensions

There are a variety of other categories that can be used to describe speech applications.

Personalized or Anonymous

Systems also can differ in their response to user identity. A user's relationship with a particular system is either anonymous or personalized. Personalized systems require users to log on each time they use the application, or be recognized by caller ID. Because the system knows the identity of the user, it can access and update that individual's interaction history with each visit or session. The update process enables the system to adapt as it gathers more knowledge of the user.

Adaptive or Static

Systems can be either adaptive or static. Adaptive systems use observed information about the world or individual users to effect changes in the system's behavior. Static systems do not adapt over time. They exhibit consistent behaviors to all users.

Frequency and Longevity

Systems also vary by frequency of use for repeat users. Some systems cater to a high ratio of first time users from the general public. Others systems are designed for repeated use by experienced users. For example, some users of subscriber-based services might access a system many times a day. Such applications typically employ design strategies that maximize longevity and minimize user fatigue.

Use of Speech in Various System Types

Various types of systems differ in their use of speech. Some, but not all, of the following functions are possible using the Speech Application SDK.

Task-based Systems

Task-based systems take users through multiple step processes that often require data collection, confirmation and information output to achieve their goals. Today, most speech systems are task-oriented. Thousands of task-based speech recognition (SR) systems are now deployed across multiple business sectors, allowing individuals to perform a diverse range of tasks easily and effectively over the telephone while enabling businesses to cut costs and take advantage of their existing telephone infrastructures. In many cases, these systems are second-generation solutions replacing previously deployed DTMF systems. While DTMF systems were economically compelling for corporations, users often viewed them as frustrating and confusing.

Voice Portals

Voice portals are the speech system equivalent of Internet portals. Combining Internet browsing with the telephone, they allow users to access information on a variety of topics corresponding to common Internet queries, such as driving directions, sports scores, stock quotes, traffic information, weather, horoscopes or restaurant locations.

Automotive Applications

Speech applications appear in many of the most advanced automotive models, allowing for voice access to increasingly complex automotive control systems. These systems aim to enhance safety in the highly complex environment of the car, where both eyes and hands are busy. Some of these applications are multimodal, combining speech input with direct manipulation. These systems typically employ a mix of graphic and voice outputs.

Navigational systems, particularly global positioning system (GPS) applications designed for automobiles, use extensive speech output. Many of these systems now accept simple voice commands as input.

Medical Applications

Medical applications of speech systems have been in place for over a decade. Doctors have been using traditional dictation devices because their work so often requires hands-free use. Using voice recognition systems to capture medical histories and records represents a leap forward from dictation systems because voice recognition systems eliminate the time-consuming transcription step.

Handheld Devices

Handheld devices such as Pocket PCs and cell phones are targets for speech recognition and multimodal applications due in part to their increasing popularity and also the constraints of their limited visual displays.

Toys

Children's toys are now employing low bandwidth speech systems to create interactive and programmable toys. Companies have produced responsive dolls and robots, in addition to more advanced programmable learning toys. Future directions for programmable toys will include the increased use of communications technologies.

Universal Access

Universal Access speech applications provide access to computers and networks for those people who are unable to access technology in a typical fashion. This group includes the blind and visually impaired, as well as those people with physical disabilities or injuries who may be unable to use a mouse or keyboard.

See Also

Designing Speech Applications