Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

Speech Recognition API Overview

This page provides an overview of the interfaces for speech recognition in the Microsoft Speech Platform, and provides links to additional topics and examples.

ISpRecoContext is the main interface for speech recognition in the Speech Platform. ISpRecoContext is an ISpEventSource, which means that it is the speech application's vehicle for receiving notifications for the requested speech recognition events.

Manage the speech recognition engine

Applications can use the ISpRecognizer interface to manage the functionality of the speech recognition (SR) engine. To create an ISpRecoContext for an ISpRecognizer, the application must first call CoCreateInstance on the component CLSID_SpInprocRecognizer to create its own ISpRecognizer. Then the application must make a call to ISpRecognizer::SetInput (see also ISpObjectToken) to set up the audio input. Finally, the application can call ISpRecognizer::CreateRecoContext to obtain an ISpRecoContext.

Register for event notifications

The next step is to register for notifications of events that are of interest to your application. As the ISpRecognizer is also an ISpEventSource, which in turn is an ISpNotifySource, your application can call one of the ISpNotifySource methods from its ISpRecoContext to indicate where the events for that ISpRecoContext should be reported. Then it should call ISpEventSource::SetInterest to indicate for which events it wants to receive notification. The most important event is SPEI_RECOGNITION, which indicates that the ISpRecognizer has recognized some speech for this ISpRecoContext. See SPEVENTENUM for details on the other available speech recognition events.

Create and load grammars

Finally, a speech application must create, load, and activate an ISpRecoGrammar. The recognition engine uses the contents of ISpRecoGrammar to define the utterances that can be recognized, typically a limited set of words and phrases specific to your application. First the application creates an ISpRecoGrammar using ISpRecoContext::CreateGrammar. Then the application loads the grammar by calling one of the ISpRecoGrammar::LoadCmdxxx methods. To activate these grammars so that recognition can start, the application calls ISpRecoGrammar::SetRuleState or ISpRecoGrammar::SetRuleIdState.

When recognition results come back to the application by means of the requested notification mechanism, the lParam member of the SPEVENT structure will be an ISpRecoResult, from which the application can determine what was recognized and for which ISpRecoGrammar of the ISpRecoContext.

An ISpRecognizer can have multiple ISpRecoContexts associated with it, and each one can be notified in its own way of events pertaining to it. An ISpRecoContext can have multiple ISpRecoGrammars created from it, each one for recognizing different types of utterances.

In This Section