Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Initialize and Manage a Speech Recognition Engine (Microsoft.Speech)

You can use the SpeechRecognitionEngine class to manage any speech recognition engine that is installed on the system. You will typically use a SpeechRecognitionEngine object to manage an installed speech recognition engine as follows:

  • Initialize a speech recognition engine and select a recognizer to use for speech recognition.

  • Configure and monitor the input to the speech recognition engine.

  • Configure parameters for recognition.

  • Register for notification of events and handle the events.

  • Load and unload speech recognition grammars.

  • Start, pause, and stop recognition operations.

Initialize a Speech Recognition Engine

You can use the parameters provided by constructors of the SpeechRecognitionEngine class to initialize an instance of a speech recognition engine and to select a recognizer that matches specific criteria, such as language-culture, the recognizer name, and other attributes.

A recognizer is an installed Runtime Language for speech recognition. The Microsoft Speech Platform Runtime 11 and Microsoft Speech Platform SDK 11 do not include any Runtime Languages for speech recognition. You must download and install a Runtime Language for each language in which you want to recognize speech. A Runtime Language includes the language model, acoustic model, and other data necessary to provision a speech engine to perform speech recognition in a particular language. See InstalledRecognizers() for more information.

Configure and Monitor the Input

You can configure the input to the SpeechRecognitionEngine to receive speech audio from a WAV stream, a WAV file, an audio stream, or from the default audio device on the system. See Audio Input for Recognition (Microsoft.Speech) for more information.

When the SpeechRecognitionEngine is receiving audio, you can monitor the incoming signal by querying the AudioState and AudioLevel properties and by registering a handler for the AudioSignalProblemOccurred event.

Configure Parameters for Recognition

To fine-tune how the recognizer responds to background noise and silence that accompanies speech input, set the values of the BabbleTimeout, EndSilenceTimeout, and EndSilenceTimeoutAmbiguous properties.

Speech recognition operations produce multiple recognition result candidates, evaluate the accuracy of each result candidate with respect to the spoken input, and return the recognition candidate that most likely matches the received speech. You can control the number of alternate recognition results that the speech engine returns by setting the MaxAlternates property. You can also query the settings of a speech recognition engine that affect recognition, such as confidence thresholds, using the QueryRecognizerSetting(String) method and modify those settings with one of the UpdateRecognizerSetting() methods.

Register for Events and Author Handlers

The SpeechRecognitionEngine automatically raises events that return information to your application about the incoming signal, loading grammars, detecting speech, preliminary recognition results, final recognition results, and the end of a recognition operation. Your application can stay informed of the status and progress of recognition operations by registering for the SpeechRecognitionEngine's events. You can author code in the handlers for the events that creates an appropriate response by your application when events are received. See Use Speech Recognition Events (Microsoft.Speech).

The SpeechHypothesized, SpeechRecognized, SpeechRecognitionRejected, RecognizeCompleted events all return a RecognitionResult object that contains detailed information about the results of recognition. This information includes the Text of the recognized word or phrase, the Semantics associated with the recognition, the Confidence score assigned by the speech recognition engine, as well as other information.

Load and Unload Grammars

Load a Grammar object using the LoadGrammar(Grammar) or LoadGrammarAsync(Grammar) methods. You can unload a specific Grammar object using the UnloadGrammar(Grammar) method, or unload all currently loaded Grammar objects with a call to UnloadAllGrammars(). If the SpeechRecognitionEngine is running, you can use one of the RequestRecognizerUpdate() methods to pause it before loading or unloading Grammar objects.

Start, Pause, and Stop Recognition

To start recognition operation, use one of the Recognize() or RecognizeAsync() methods.

To stop an asynchronous recognition operation, use the RecognizeAsyncCancel() or RecognizeAsyncStop() methods.

You can pause a running SpeechRecognitionEngine instance to update its configuration or to load and unload grammars using one of the RequestRecognizerUpdate() methods.

The SpeechRecognitionEngine can perform an additional mode of recognition (called emulation) during which it accepts text, rather than speech, as input. Emulated recognition can be useful for debugging grammars. The speech recognizer raises the SpeechDetected, SpeechHypothesized, SpeechRecognitionRejected, and SpeechRecognized events as if the recognition operation is not emulated. To initiate emulated recognition, call one of the EmulateRecognize() or EmulateRecognizeAsync() methods and pass in text or an array of words for which you want to perform emulated recognition.