Speech Recognition Sample

Glossary Item Box

User Interface: Microsoft Speech API

Technology Samples: Text-To-Speech Service

VPL User Interface Services: Speech Recognition

See Also Microsoft Robotics Developer Studio Send feedback on this topic

Speech Recognition Sample

The SpeechRecognizer service provides different ways to use speech recognition, depending on the complexity of the project or knowledge of the user.

This sample is provided in the C# language. You can find the project files for this sample at the following location under the Microsoft Robotics Developer Studio installation folder:

Samples\Technologies\Speech\SpeechRecognizer

This sample covers:

  • Speech Recognizer.

Prerequisites

Hardware

A microphone to record speech commands.

Software

The .NET 3.0 runtime or later is required for the System.Speech libraries, and a trained speech recognition profile for improved speech recognition (optional). Note that speech recognition is not available on all platforms. In particular, it is not available for some Intel Architecture-64 bit operating systems, which include Windows XP 64-bit Edition and Windows Server 2003 Enterprise and DataCenter Editions. However, it is included with Vista 64.

Bb608250.hs-caution(en-us,MSDN.10).gif

The latest version of the Speech API (SAPI) is V5.3 which ships with Vista. However, only version 5.1 is available for Windows XP. This means that Speech Recognition Grammar Specification (SRGS) grammar files are not supported under Windows XP and an exception is generated if you try to load a SRGS file on Windows XP. You can still use the simple dictionary format for grammar files on Windows XP, so speech recognition is still possible.

You will also need Microsoft Internet Explorer or another conventional web browser.

Speech Recognizer

Icon

The SpeechRecognizer service represents the core speech recognition service (as opposed to the SpeechRecognizerGui service which offers the user interface component to the core service). The core service allows for usage of simple dictionary-style grammars as well as complex SRGS (Speech Recognition Grammar Specification) grammars, specified in XML.

Step 1: Set the Initial State

The SpeechRecognizer service supports the Initial State partner. The initial state is used to configure:

  • What type of grammar is being used
  • What the grammar looks like or where it can be loaded from

The default config file has to be called "SpeechRecognizer.config.xml", and it specifies the commands that will be use by the recognizer.

The config file for a dictionary-style grammar could look as follows:

<?xml version="1.0" encoding="utf-8"?>
<SpeechRecognizerState xmlns:s="http://www.w3.org/2003/05/soap-envelope" 
xmlns:wsa="https://schemas.xmlsoap.org/ws/2004/08/addressing" 
xmlns:d="https://schemas.microsoft.com/xw/2004/10/dssp.html" 
xmlns="https://schemas.microsoft.com/robotics/2008/02/speechrecognizer.html">
  <DictionaryGrammar>
    <Elem>
      <string >Hello world</string> 
      <string >HelloWorld</string> 
    </Elem>
  </DictionaryGrammar>
  <IgnoreAudioInput>false</IgnoreAudioInput>
  <GrammarType>DictionaryStyle</GrammarType>
</SpeechRecognizerState>

Step 2: Start and Run the Sample

Start the DSS Command Prompt from the Start > All Programs menu.

Start a DssHost node and create an instance of the service by typing the following command:

dsshost /p:50000 /m:"samples\config\SpeechRecognizer.manifest.xml"

This starts the service and you see a response like the following:

* Starting manifest load: file:///C:/.../samples/config/SpeechRecognizer.manifest.xml
[03/28/2008 14:57:30][https://localhost:50000/manifestloaderclient]
* Manifest load complete [03/28/2008 14:57:31][https://localhost:50000/manifestloaderclient]
* Service started [03/28/2008 14:57:36][https://localhost:50000/speechrecognizer]

Step 3: Start the GUI to Configure Speech Recognition

The SpeechRecognizer service itself does not expose a user interface, which makes it hard to test without writing your own service or VPL diagram. The sibling service SpeechRecognizerGui however allows for configuration of simple dictionary-style grammars or for upload of more complex SRGS (Speech Recognition Grammar Specification) grammar files written in XML by means of a web interface. You can start an instance of the SpeechRecognizerGui once you have a DSS node running by using a web browser and going to the Control Panel page.

Once the SpeechRecognizerGui is running, browse to the web page for the service. This is shown in the figure below. The web interface shows events such as speech detected or speech recognized in a scrolling area that can be cleared.

Speech Recognizer

Speech Recognizer - GUI Service

At the bottom of the SpeechRecognizerGui web page you can define a grammar. Note that the SpeechRecognizer only recognizes words and phrases that are in its grammar. If the grammar is empty, then nothing will be recognized.

Bb608250.hs-note(en-us,MSDN.10).gif

The screenshot above shows a simple dictionary type of grammar. This can be used on either Windows XP or Vista. If you change the grammar type to SRGS file, then you will not be able to use Speech Recognition on Windows XP because it does not support this file format.

Summary

In this sample, we covered:

  • Speech Recognizer.
See Also 

User Interface: Microsoft Speech API

Technology Samples: Text-To-Speech Service

VPL User Interface Services: Speech Recognition

 

 

© 2012 Microsoft Corporation. All Rights Reserved.