Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

How to define custom recognition constraints (XAML)

Learn how to define and use custom constraints for speech recognition.

Note  Voice commands and speech recognition are not supported by Windows Store apps in Windows 8 and Windows 8.1.

 

Speech recognition requires at least one constraint to define a recognizable vocabulary. If no constraint is specified, the predefined dictation grammar of Universal Windows apps is used. See Quickstart: Speech recognition.

What you need to know

Technologies

Prerequisites

This topic builds on Quickstart: Speech recognition.

To complete this tutorial, have a look through these topics to get familiar with the technologies discussed here.

Instructions

Step 1: Add constraints

Use the SpeechRecognizer.Constraints property to add constraints to a speech recognizer.

Here we cover the three kinds of speech recognition constraints used from within an app. (For voice command constraints, see Quickstart: Voice commands.)

Each speech recognizer can have one constraint collection. Only these combinations of constraints are valid:

  • A single-topic constraint, or predefined grammar (dictation or web search). No other constraints are allowed.
  • A combination of list constraints and/or grammar-file constraints.

Remember: Call the SpeechRecognizer.CompileConstraintsAsync method to compile the constraints before starting the recognition process.

Step 2: Specify a web-search grammar (SpeechRecognitionTopicConstraint)

Topic constraints (dictation or web-search grammar) must be added to the constraints collection of a speech recognizer.

Here, we add a web-search grammar to the constraints collection.

private async void WeatherSearch_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Listen for audio input issues.
    speechRecognizer.RecognitionQualityDegrading += speechRecognizer_RecognitionQualityDegrading;

    // Add a web search grammar to the recognizer.
    var webSearchGrammar = new Windows.Media.SpeechRecognition.SpeechRecognitionTopicConstraint(Windows.Media.SpeechRecognition.SpeechRecognitionScenario.WebSearch, "webSearch");


    speechRecognizer.UIOptions.AudiblePrompt = "Say what you want to search for...";
    speechRecognizer.UIOptions.ExampleText = @"Ex. 'weather for London'";
    speechRecognizer.Constraints.Add(webSearchGrammar);

    // Compile the constraint.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
    //await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

Step 3: Specify a programmatic list constraint (SpeechRecognitionListConstraint)

List constraints must be added to the constraints collection of a speech recognizer.

Keep the following points in mind:

  • You can add multiple list constraints to a constraints collection.
  • You can use any collection that implements IIterable<String> for the string values.

Here, we programmatically specify an array of words as a list constraint and add it to the constraints collection of a speech recognizer.

private async void YesOrNo_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // You could create this array dynamically.
    string[] responses = { "Yes", "No" };


    // Add a list constraint to the recognizer.
    var listConstraint = new Windows.Media.SpeechRecognition.SpeechRecognitionListConstraint(responses, "yesOrNo");

    speechRecognizer.UIOptions.ExampleText = @"Ex. 'yes', 'no'";
    speechRecognizer.Constraints.Add(listConstraint);

    // Compile the constraint.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

Step 4: Specify an SRGS grammar constraint (SpeechRecognitionGrammarFileConstraint)

SRGS grammar files must be added to the constraints collection of a speech recognizer.

The SRGS Version 1.0 is the industry-standard markup language for creating XML-format grammars for speech recognition. Although Universal Windows apps provide alternatives to using SRGS for creating speech-recognition grammars, you might find that using SRGS to create grammars produces the best results, particularly for more involved speech recognition scenarios.

SRGS grammars provide a full set of features to help you architect complex voice interaction for your apps. For example, with SRGS grammars you can:

  • Specify the order in which words and phrases must be spoken to be recognized.
  • Combine words from multiple lists and phrases to be recognized.
  • Link to other grammars.
  • Assign a weight to an alternative word or phrase to increase or decrease the likelihood that it will be used to match speech input.
  • Include optional words or phrases.
  • Use special rules that help filter out unspecified or unanticipated input, such as random speech that doesn't match the grammar, or background noise.
  • Use semantics to define what speech recognition means to your app.
  • Specify pronunciations, either inline in a grammar or via a link to a lexicon.

See the SRGS Grammar XML Reference for more info about SRGS elements and attributes. To get started creating an SRGS grammar, see How to Create a Basic XML Grammar.

Keep the following points in mind:

  • You can add multiple grammar-file constraints to a constraints collection.
  • Use the .grxml file extension for XML-based grammar documents that conform to SRGS rules.

This example uses an SRGS grammar defined in a file named srgs.grxml (described later). In the file properties, the Package Action is set to Content with Copy to Output Directory set to Copy always:

private async void Colors_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Add a grammar file constraint to the recognizer.
    var storageFile = await Windows.Storage.StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Colors.grxml"));
    var grammarFileConstraint = new Windows.Media.SpeechRecognition.SpeechRecognitionGrammarFileConstraint(storageFile, "colors");

    speechRecognizer.UIOptions.ExampleText = @"Ex. 'blue background', 'green text'";
    speechRecognizer.Constraints.Add(grammarFileConstraint);

    // Compile the constraint.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

This SRGS file (srgs.grxml) includes semantic interpretation tags. These tags provide a mechanism for returning grammar match data to your app. Grammars must conform to the World Wide Web Consortium (W3C) Semantic Interpretation for Speech Recognition (SISR) 1.0 specification.

Here, we listen for variants of "yes" and "no".

<grammar xml:lang="en-US" 
         root="yesOrNo"
         version="1.0" 
         tag-format="semantics/1.0"
         xmlns="http://www.w3.org/2001/06/grammar">

    <!-- The following rules recognize variants of yes and no. -->
      <rule id="yesOrNo">
         <one-of>
            <item>
              <one-of>
                 <item>yes</item>
                 <item>yeah</item>
                 <item>yep</item>
                 <item>yup</item>
                 <item>un huh</item>
                 <item>yay yus</item>
              </one-of>
              <tag>out="yes";</tag>
            </item>
            <item>
              <one-of>
                 <item>no</item>
                 <item>nope</item>
                 <item>nah</item>
                 <item>uh uh</item>
               </one-of>
               <tag>out="no";</tag>
            </item>
         </one-of>
      </rule>
</grammar>

Step 5: Manage constraints

After a constraint collection is loaded for recognition, your app can manage which constraints are enabled for recognition operations by setting the IsEnabled property of a constraint to true or false. The default setting is true.

It's usually more efficient to load constraints once, enabling and disabling them as needed, rather than to load, unload, and compile constraints for each recognition operation. Use the IsEnabled property, as required.

Restricting the number of constraints serves to limit the amount of data that the speech recognizer needs to search and match against the speech input. This can improve both the performance and the accuracy of speech recognition.

Decide which constraints are enabled based on the phrases that your app can expect in the context of the current recognition operation. For example, if the current app context is to display a color, you probably don't need to enable a constraint that recognizes the names of animals.

To prompt the user for what can be spoken, use the SpeechRecognizerUIOptions.AudiblePrompt and SpeechRecognizerUIOptions.ExampleText properties, which are set by means of the SpeechRecognizer.UIOptions property. Preparing users for what they can say during the recognition operation increases the likelihood that they will speak a phrase that can be matched to an active constraint.

Responding to speech interactions

Designers

Speech design guidelines