Creating Speech Recognition Calculators in UCMA 3.0: Grammar Creation (Part 2 of 4)
Summary: Add speech recognition and speech synthesis to your Microsoft Unified Communications Managed API (UCMA) 3.0 application by incorporating the recognition and synthesis APIs of Microsoft Speech Platform SDK. Part 2 describes how to create and load either of two forms of the grammar that is used in the application that is described in this series of articles.
Applies to: Microsoft Unified Communications Managed API (UCMA) 3.0 Core SDK | Microsoft Speech Platform SDK
Published: November 2011 | Provided by: Mark Parker, Microsoft | About the Author
Contents
Graphical Representation of the Speech Recognition Grammar
Semantic Values
GrammarBuilder Grammar
SRGS XML Grammar
Loading the Grammar
Part 3
Additional Resources
This article is the second in a four-part series of articles about how to create a calculator that uses speech recognition and speech synthesis.
Creating Speech Recognition Calculators in UCMA 3.0: Introduction (Part 1 of 4)
Creating Speech Recognition Calculators in UCMA 3.0: UCMA Infrastructure (Part 3 of 4)
Creating Speech Recognition Calculators in UCMA 3.0: Code Listing and Conclusion (Part 4 of 4)
Graphical Representation of the Speech Recognition Grammar
Two functionally equivalent grammars are presented in this article. The first grammar is created at run time by using the GrammarBuilder class. The second grammar is a Speech Recognition Grammar Specification (SRGS) XML grammar that is loaded into memory by the application. The SRGS grammar is shown for reference.
Important
An application should use only one of the grammars that are discussed in this article.
Figure 1 is a graphical representation of the grammars that appear in part 2. For a user’s utterance to match either grammar, there must be exact agreement between what the user says and one of the paths shown in figure 1. If the user says “exit,” there is a match along the upper path. If the user says “how much is fifteen divided by five” or “eight times sixteen,” there is a match along the lower path.
Figure 1. Graphical representation of grammar
The grammar can be presented in the notation of a Backus-Naur Form.
<expression> ::= “exit” | <arithmetic_expression> | “how much is” <arithmetic_expression>
<arithmetic_expression> ::= <number> <operation> <number>
<number> ::= “zero” | “one” | “two” | “three” | “four” |
“five” | “six” | “seven” | “eight” | “nine” |
“ten” | “eleven” | “twelve” | “thirteen” | “fourteen” |
“fifteen” | “sixteen” | “seventeen” | “eighteen” | “nineteen” | “twenty”
<operation> :== “plus” | “and” | “minus” | “times” | “multiplied by” | “divided by”
Important
The Backus-Naur notation is for informational purposes only, to aid you in understanding the grammar structure. The speech recognition engine in the Microsoft.Speech assembly does not accept a grammar in Backus-Naur Form.
Semantic Values
A simple grammar can return the word or phrase that it recognizes. In many cases, a grammar is designed to return information about the meaning of an utterance instead of the utterance itself. For the purposes of the application in this series of articles, the grammar should extract the meaning of what was said. For example, if the user says “how much is nine multiplied by twelve,” the intent of this question is to evaluate 9 * 12. These three items, two numbers and a symbol for the operation to use, are the semantic items of interest in the user’s utterance.
The grammars that are presented here return semantic values that depend on what the user says. If the recognized utterance is any of the words “zero,” “one,” “two,” through “twenty,” the semantic value that is returned is the corresponding numeric value, 0 through 20. Similarly, if the utterance matches “and” or “plus,” the returned semantic value will be the string “+”.
GrammarBuilder Grammar
The following example is the definition of the CreateGrammar method, which returns a GrammarBuilder object.
private GrammarBuilder CreateGrammar()
{
GrammarBuilder [] gb = new GrammarBuilder[]{null, null};
gb[0] = new GrammarBuilder(new Choices("exit"));
gb[1] = new GrammarBuilder();
gb[1].Append("how much is", 0, 1);
string[] numberString = { "zero", "one", "two", "three", "four", "five",
"six", "seven", "eight", "nine", "ten",
"eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen", "twenty"};
Choices numberChoices = new Choices();
for (int i = 0; i < numberString.Length; i++)
{
numberChoices.Add(new SemanticResultValue(numberString[i], i));
}
gb[1].Append(new SemanticResultKey("number1", (GrammarBuilder)numberChoices));
string[] operatorString = { "plus", "and", "minus", "times", "divided by" };
Choices operatorChoices = new Choices();
operatorChoices.Add(new SemanticResultValue("plus", "+"));
operatorChoices.Add(new SemanticResultValue("and", "+"));
operatorChoices.Add(new SemanticResultValue("minus", "-"));
operatorChoices.Add(new SemanticResultValue("times", "*"));
operatorChoices.Add(new SemanticResultValue("multiplied by", "*"));
operatorChoices.Add(new SemanticResultValue("divided by", "/"));
gb[1].Append(new SemanticResultKey("operator", (GrammarBuilder)operatorChoices));
gb[1].Append(new SemanticResultKey("number2", (GrammarBuilder)numberChoices));
Choices choices = new Choices(gb);
return new GrammarBuilder(choices);
}
The Choices class is used when a grammar consists of several mutually exclusive options. One Choices object is used to hold the 21 numbers that are recognized by the grammar, and another is used to hold the six options for the operation to be performed. A third Choices object is created near the end of the previous code sample. This object is initialized with a two-element array of GrammarBuilder instances. One of the GrammarBuilder array elements contains the string “exit.” The other contains the part of the grammar that recognizes a number, followed by an operation, followed by a second number.
A SemanticResultValue instance is created for each of the 21 numbers that are defined in the grammar, and for each of the six operations that are defined. These SemanticResultValue instances make it possible for the grammar to return a value that differs from what the user spoke. For example, if the user says “twelve,” the grammar returns the number 12. Similarly, if the user says’ “plus,” the grammar returns the string “+”. Working with SemanticResultValue instances simplifies the logic in the application.
Three SemanticResultKey instances are appended to the grammar. These instances, number1, operator, and number2, define keys that are used to access the list of key-value pairs that are returned by the grammar. For more information about how the keys are used, see Creating Speech Recognition Calculators in UCMA 3.0: UCMA Infrastructure (Part 3 of 4).
SRGS XML Grammar
The SRGS XML grammar that appears in this section is made up of three rules: a main rule whose name (id) is “Expression”, a rule whose name is “Number”, and a rule whose name is “Operator”.
The following example is the definition of the SRGS XML grammar.
<?xml version="1.0" encoding="UTF-8" ?>
<grammar version="1.0" xml:lang="en-US" mode="voice" root= "Expression"
xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">
<rule id="Expression" scope="public">
<example>four plus seven</example>
<example>four and seven</example>
<example>how much is four multiplied by seven</example>
<tag>out.number1=0; out.operator = "";out.number2=0;</tag>
<one-of>
<item>exit</item>
<item>
<item repeat="0-1">how much is</item>
<ruleref uri ="#Number" type="application/srgs+xml"/>
<tag>out.number1=rules.latest();</tag>
<ruleref uri ="#Operator" type="application/srgs+xml"/>
<tag>out.operator=rules.latest();</tag>
<ruleref uri ="#Number" type="application/srgs+xml"/>
<tag>out.number2=rules.latest();</tag>
</item>
</one-of>
</rule>
<rule id="Number">
<one-of>
<item> zero <tag>out = 0; </tag> </item>
<item> one <tag>out = 1; </tag> </item>
<item> two <tag>out = 2; </tag> </item>
<item> three <tag>out = 3; </tag> </item>
<item> four <tag>out = 4; </tag> </item>
<item> five <tag>out = 5; </tag> </item>
<item> six <tag>out = 6; </tag> </item>
<item> seven <tag>out = 7; </tag> </item>
<item> eight <tag>out = 8; </tag> </item>
<item> nine <tag>out = 9; </tag> </item>
<item> ten <tag>out = 10; </tag> </item>
<item> eleven <tag>out = 11; </tag> </item>
<item> twelve <tag>out = 12; </tag> </item>
<item> thirteen <tag>out = 13; </tag> </item>
<item> fourteen <tag>out = 14; </tag> </item>
<item> fifteen <tag>out = 15; </tag> </item>
<item> sixteen <tag>out = 16; </tag> </item>
<item> seventeen <tag>out = 17; </tag> </item>
<item> eighteen <tag>out = 18; </tag> </item>
<item> nineteen <tag>out = 19; </tag> </item>
<item> twenty <tag>out = 20; </tag> </item>
</one-of>
</rule>
<rule id="Operator">
<one-of>
<item> plus <tag>out = "+"; </tag> </item>
<item> and <tag>out = "+"; </tag> </item>
<item> minus <tag>out = "-"; </tag> </item>
<item> times <tag>out = "*"; </tag> </item>
<item> multiplied by <tag>out = "*"; </tag> </item>
<item> divided by <tag>out = "/"; </tag> </item>
</one-of>
</rule>
</grammar>
The Expression rule consists of an item element and three ruleref elements. Following each ruleref element is a tag element that assigns the semantic value that is returned by the rule to the appropriate semantic key. For more information about how the keys are used, see Creating Speech Recognition Calculators in UCMA 3.0: UCMA Infrastructure (Part 3 of 4).
Loading the Grammar
The two grammars that are presented in part 2 differ in how they are loaded.
To load the GrammarBuilder grammar
Create a Grammar instance by using the constructor that takes a GrammarBuilder object as its parameter. The CreateGrammar method that is described in part 2 returns a GrammarBuilder instance.
Grammar gr = new Grammar(CreateGrammar());
Call the LoadGrammarAsync method on the speech recognition engine.
speechRecognitionEngine.LoadGrammarAsync(gr);
To load the SRGS XML grammar
Create a Grammar instance by using the constructor that takes a string that contains the full path and file name of the grammar file. In the example shown here, the second argument is the name of the main rule.
String currDirPath = Environment.CurrentDirectory; Grammar gr = new Grammar(currDirPath + @"\NumberOpNumber.grxml", "Expression");
Call the LoadGrammarAsync method on the speech recognition engine.
speechRecognitionEngine.LoadGrammarAsync(gr);
Part 3
Creating Speech Recognition Calculators in UCMA 3.0: UCMA Infrastructure (Part 3 of 4)
Additional Resources
For more information, see the following resources:
Unified Communications Managed API 3.0 Core SDK Documentation
Microsoft Speech Platform – Software Development Kit (SDK) (Version 10.2)
About the Author
Mark Parker is a programming writer at Microsoft whose current responsibility is the UCMA SDK documentation. Mark previously worked on the Microsoft Speech Server 2007 documentation.