SpeechRecoContext Interface (SAPI 5.3)
Microsoft Speech API 5.3
ISpeechRecoContext
The ISpeechRecoContext automation interface defines a recognition context.
For a list of available methods and properties, see Method/Property List.
What is a Recognition Context?
A recognition context is the primary means by which an application interacts with SAPI for speech recognition. It is an object that allows an application to start and stop recognition, receive recognition results and other events. It also controls which words and phrases are available for the user to speak. An application may have several recognition contexts open at the same time, each controlling a different part of the application. A specific recognition context controls the collection of available words and is associated with a specific part of the application. In a more general sense, that word collection is the confine to which speech recognition attempts are restricted and will poll within to match words. Words not contained in the collection or context, will not be used for that speech recognition attempt. By setting recognition contexts, applications limit or expand the scope of the words needed for a particular aspect of the application. This granularity for speech recognition improves the quality of recognition by removing words not needed at that moment. Conversely, the granularity also allows words to be added to the application if needed.
For example, an application may have only one recognition context: that of all the words in the dictionary and those words are available all the time. If that application were purely dictation, the one-context model would work well. The user could say any word at any time to the application and it would probably be successfully recognized. However, if the application had a new requirement of exiting when the user said "close," that one-context model breaks down. The user would be disappointed if, in the course of dictation, the word "close" were spoken and the application suddenly stopped and closed.
Clearly, there are two uses (or contexts) for the word "close." The first is a part of speech ("please close the door," "that was too close for comfort," "we'll close in on the criminal"). The second context is that of a specific command. There must be a method to differentiate the two. A recognition context permits applications to do that.
Applications may have more than one recognition context. In fact, it is recommended to have as many as makes sense. For example, one recognition context may be assigned to the menu bar, another to the dictation screen, yet another to dialog boxes, even if only temporarily such as a Yes/No/Cancel dialog box. Programmers need to decide the scope of the recognition context. The menu system for an application may even have multiple recognition contexts, perhaps one for each menu bar item. This granularity grants applications the ability to concentrate resources robustly. For example, a small menu may only have 12 items associated with it. Not only that, but it would be 12 very specific words. It makes little sense, therefore, to have the entire dictation collection, some 65,000 to 100,000 words, available when in fact only 12 words are needed. The larger-than-needed vocabulary would not only take up more processing time, but could result in more mismatched words. By the same reasoning, in the "close" example above, a dictation model should treat the word "close" no differently than any other word. Two recognition contexts could be used to separate the differences.
Using Recognition Contexts
Creating a recognition context is done using a two-step process. The context must be declared and then created. The following code sample creates an instance of a recognition context named RC. The keyword New creates a reference to a new object of the specified class.
Public WithEvents RC As SpSharedRecoContext
Set RC = New SpSharedRecoContext
Recognition context types
Recognition contexts may be one of two types: shared or in process (InProc). A shared context allows resources to be used by other recognition contexts or applications. All applications on the machine using shared recognition contexts are sharing a single audio input, SR engine, and grammars. When the user speaks, the SR engine will do recognition, and SAPI decides which context to send the recognition result to, based on which grammar the result best matches. In general, most applications should use shared contexts. The following code snippet declares a shared recognition context.
Public WithEvents RC As SpSharedRecoContext
InProc contexts restrict available resources to one context or application. That is, an SR engine or microphone used by an InProc recognition context may not be used by any other applications. In situations requiring the highest performance standards, response time or exacting recognition quality, use the InProc context. InProc contexts are important to embedded systems in other hardware platforms. InProc contexts are also used for non-microphone recognition such as recognizing from a file. However, InProc should be used sparingly since it excludes other applications from the speech recognition resources. The following code snippet declares an InProc recognition context.
Public WithEvents RC As SpInProcRecoContext
In either case, the two types are based on ISpeechRecoContext. Any declaration should include the keyword WithEvents so that recognition context also supports events.
Defaults
Recognition context is created with intelligent defaults using the defaults of the computer system. These defaults are assigned using Speech properties in Control Panel. While applications may override default values for specific reasons, applications should not manually set or change default values directly. These defaults include:
- Recognizer to determine the speech recognition engine
- EventInterests to determine which events the speech recognition engine generates
- RetainedAudio to persist the actual audio for the speech
- RetainedAudioFormat to determine the retained audio format
- Voice to speak the text
Grammars
The only resource that must be explicitly created is the grammar using CreateGrammar. The grammar defines the set of words for the recognition context. Grammars also may be of two types: dictation and command and control (C and C). Dictation grammars are usually an unrestricted word list designed to encompass the full range of words in a language. Dictation allows any word or phrase to be spoken and it is used in the traditional sense to dictate a letter or paper, for example. The following code snippet declares a dictation grammar. It assumes a valid RC recognition context.
Set myGrammar = RC.CreateGrammar
myGrammar.DictationSetState SGDSActive
A command and control grammar is a limited word list restricting the speaker to a small set of words. In this way, users can speak a command, usually a single word, with greater chance of recognition. The smaller scope of words disallows words not on a specific list. A grammar is useful for speech-enabling menus, for example. Menu grammars are typically smaller with exact word or phrase commands such as "New," "Exit," or "Open." The following code snippet declares a command and control grammar. It assumes a valid RC recognition context.
Set myGrammar = RC.CreateGrammar
myGrammar.CmdLoadFromFile "sol.xml", SLODynamic
myGrammar.CmdSetRuleIdState 101, SGDSActive
Because the word list is limited, an explicit list is used. In this case, the command file sol.xml is used. In addition, the code sample activates one rule; the rule has an identification value of 101. For a more thorough discussion of grammars and designing grammars see Text grammar format.
States
While individual grammar rules may be activated or deactivated as conditions change, all grammars in the recognition context may also be activated or deactivated with the State property. Grammars can be turned off, for example, if the window is no longer the current focus and likewise turned back on when the window becomes foremost again. In addition the recognition context may be momentarily stopped and then restarted. The Pause method halts the speech recognition temporarily for the engine to synchronize with the grammars. After a pause, Resume resumes the recognition process. While paused, the engine will continue to accept sound input and speech processing, provided the pause is not excessive; by default, this is not more than 30 seconds.
The ISpeechRecoContext object is always associated with a single speech recognition engine (also called a recognizer). However, a single recognizer may have many recognition contexts.
Events
As a result of interactions with the recognition context, the SR engine sends back certain information to the application using the Events mechanism. An event is a specific occurrence that might be of interest to the user or application. Examples of events include notifying the application of a successful recognition or indicating that a designated position in the stream has been reached. Regardless, the application is free to process events or ignore them.
In addition, events may be filtered, allowing the engine to return some or all events, or to prevent an event from being generated in the first place if it has no significance to the application. Filtering is controlled by EventInterests.
A complete list of events is described in ISpeechRecoContext events.
Automation Interface Elements
The ISpeechRecoContext automation interface contains the following elements:
Properties |
AllowVoiceFormatMatchingOnNextSet Property |
AudioInputInterferenceStatus Property |
CmdMaxAlternates Property |
EventInterests Property |
Recognizer Property |
RequestedUIType Property |
RetainedAudio Property |
RetainedAudioFormat Property |
State Property |
Voice Property |
VoicePurgeEvent Property |
Methods |
Bookmark Method |
CreateGrammar Method |
CreateResultFromMemory Method |
Pause Method |
Resume Method |
SetAdaptationData Method |