Share via


Using Grammar Files

  Microsoft Speech Technologies Homepage

Grammar files do two things:

  • Provide rules that define all possible combinations of the words or phrases that a user can speak to an application. These rules all the speech recognition engine to convert speech to text.
  • Add semantic information to the recognized text.

Grammar Files and Semantic Interpretation

Grammar authors implement semantic interpretation by associating semantic Script Tag elements with relevant elements in a grammar. The Script Tag elements contain the property name/value pairs and JScript scripts that generate the content of a semantic result. The speech recognition engine generates semantic results in a stream of XML text called Semantic Markup Language (SML).

Note  Upon successful recognition of a spoken phrase, the recognition engine creates a path through the grammar rule that matches the spoken phrase. This process activates and runs any scripts that appear on Script Tag elements along the path.

The following code example illustrates a sample of SML text created by the recognition engine when it recognizes the phrase "I'd like reservations for two at seven o'clock."

  <SML text="I'd like reservations for two at seven o'clock" utteranceConfidence="1.000">
      I'd like reservations for two at seven o'clock
</SML>

Without modification, this stream of SML text may not be useful to an application because of two main factors.

  • The SML text contains the text of an entire spoken phrase. An application rarely needs all of the text in a spoken phrase. For example, although a grammar rule or rules should enable the recognition engine to recognize the entire phrase "I'd like reservations for two at seven o'clock," the application may only need the number in the party, "two," and the time, "seven o'clock."
  • Additional data or reformatted text may be more relevant to the application than the actual text of the spoken phrase. Using the previous example, the application may need the number in the party, "two," in numeric format, such as 2, and the time, "seven o'clock," in time format, such as 07:00 or 19:00.

By adding semantic information to grammar rules, grammar authors can ensure that the speech recognition engine provides an application with meaningful and useful data. The following code example illustrates a sample of SML text created by the recognition engine when it recognizes a rule that defines "I'd like reservations for two at seven o'clock" and that contains semantic information.

  <SML confidence="1.000" text="I'd like reservations for two at seven o'clock"
  utteranceConfidence="1.000">
      <number confidence="1.000">2</number>
      <time confidence="1.000">07:00</time>
</SML>

Add semantic information to grammar rules using Semantic Script Editor.

Grammar Development

Grammars consist of XML markup in a format defined in the World Wide Web Consortium Speech Recognition Grammar Specification Version 1.0 (W3C SRGS). The W3C SRGS includes predefined grammar XML markup that defines grammars. Apart from a few minor differences, the grammar XML markup in the SASDK conforms to the W3C SRGS standard.

Because grammar files are built and stored in an XML format, grammar authors can create them manually using a plain text editor such as Notepad. However, this approach can become tedious, complex, and more error-prone when creating long grammars or sets of grammars.

To simplify the process of creating and maintaining grammar files, the Microsoft Speech Application SDK Version 1.1 (SASDK) provides Speech Grammar Editor. Use Speech Grammar Editor to create grammar files by inserting drag-and-drop graphical elements, which represent grammar XML markup, onto the Rule Editor design canvas in Visual Studio .NET 2003. Add semantic Script Tag elements to elements in a grammar, and test words or phrases against the grammar to verify whether the recognition engine will recognize them. The graphical approach implemented by Speech Grammar Editor enables grammar authors to concentrate on creating the actual grammar rules instead of laborious tasks such as verifying proper element syntax.

See Also

Enabling Speech Recognition | Creating Grammars | Grammar Design