Constructing Grammars

SRGS grammars are XML documents and must be well-formed and valid. This topic describes their elements and attributes and includes the following sections:

  • XML elements
  • Basic grammar structure
    • The <grammar> element
    • The root rule
  • Defining phrases
    • The <item> element
    • The <one-of> element
    • Weighted <item> elements
    • Repeated <item> elements
  • Using more than one rule
    • Subrules
    • Using the <ruleref> element
  • Special rules

XML elements

The main XML elements are listed here with brief descriptions. Numerous examples later in this document show how to use them.

<grammar> -- The root tag of an SRGS grammar. It is required, with the appropriate attributes.

<rule> -- A child of <grammar>. All SRGS grammars are composed of rules. There can be more than one rule in a grammar, in which case one of the rules must be the primary, "top," or "root" rule. The others are "subgrammars" that can be called from the top rule using <ruleref> elements.

Note

Empty rules are illegal.

<item> -- One of the possible speaker responses that the application expects, as listed in the grammar. It can be a child of a <rule> element, a child of another <item> element, or a child of a <one-of> element.

<one-of> -- A child of a <rule> or <item> element. Its children are a list of alternative <item> elements. Only one child item in the <one-of> list can match.

<ruleref> -- A reference to a subgrammar or stand-alone external grammar file that can appear anywhere a word or DTMF digit is allowed.

<tag> -- A child of an <item> or <rule> element. It tells the grammar how to return values to the VoiceXML application when its parent <item> or <rule> is a match. If the parent <item> or <rule> is not a match, the contents of the <tag> element are ignored.

Basic grammar structure

The <grammar> element

The <grammar> element is the root element of your grammar. It has a number of attributes. The most common are:

Attribute Description Examples

mode

voice or dtmf. voice is the default if the mode attribute is not present

mode="voice"

language

Required if mode = "voice", otherwise optional. Inline grammars can inherit this attribute from the <vxml> element attributes.

xml:lang="en-US"

version

Grammar version = 1.0, required

version="1.0"

tag-format

The content type in <tag> elements. Can be semantics ECMAScript or string literals. Required if <tag> elements are used. See Semantic Interpretation for details.

tag-format="semantics/1.0" (semantic ECMAScript) tag-format="semantics/1.0-literals" (string literals)

root

The ID of the root rule in the grammar. Required by the Tellme Platform, except for external grammars that are referenced with a fragment reference (#).

root="topRuleName"

xmlns

XML namespace. Do not use in inline grammars.

xmlns="http://www.w3.org/2001/06/grammar"

The root rule

Inline grammars

For inline grammars, every <grammar> element must have a root attribute.

Here is a simple example:

<grammar  mode="voice" xml:lang="en-US" 
          tag-format="semantics/1.0"
          version="1.0" root="location">
   <rule id="location">
      <item>charleston south carolina </item>
   </rule>
</grammar>

A match occurs if and only if the speaker’s utterance includes all three words in sequence and nothing else

Stand-alone grammars

An external grammar does not need a root attribute provided that VoiceXML applications always reference it with a fragment reference like this: <grammar src="externalgrammar.grxml#ruleName"/>

In this case, the fragment reference identifies the rule to be used.

If the external grammar is referenced without a fragment reference, its <grammar> element must include a root attribute.

Rule attributes

Rules have two attributes, id and scope.

  • id—the id is simply the name of the rule. For example, id="topRule". It is a required attribute.
  • scope—the scope of a rule may be public or private. If the scope is not explicitly declared in a rule definition, then it defaults to private. A public-scoped rule can be referenced in the rule definitions of other grammars in the VoiceXML application. A private-scoped rule can be referenced only by other rules within the same grammar.

Defining phrases

The <item> element

The <item> element encloses a word, phrase, or DTMF tone to be matched.

In simple grammars the <item> tags are not required. However, it is good practice to always include the <item> tags and that is done throughout this document.

The <one-of> element

A <one-of> element presents a list of alternatives. Only one can match. The alternative words or phrases must be enclosed in <item> tags. For example,

<one-of>
   <item>california</item>
   <item>arizona</item>
   <item>oregon</item>
   <item>washington</item>
</one-of>

The speech recognition engine finds a match between the speaker's utterance and this grammar only if one of the four states is spoken.

Weighted <item> elements

In <one-of> lists, the <item> elements can be weighted. Weights are positive numbers. If weights are used, each <item> in the <one-of> list must be weighted. The relative weight for each <item> is its contribution to the whole. For example:

<one-of>
  <item weight="3.14">pie</item>
  <item weight="1.41">root beer</item>
  <item weight=".25">cola</item>
</one-of>

Here, the relative weights are:

For pie, the relative weight is 3.14/(3.14 + 1.41 + .25) = 3.14/4.8 = 0.654, the relative weight for root beer is 1.41/4.8 = 0.294, and the relative weight for cola is .25/4.8 = 0.052.

Repeated <item> elements

Words in a rule can be repeated. The repeat attribute of the <item> element provides for several possibilities:

<item repeat="0-1">....</item> allows the contents of the <item> element to occur 0 or 1 times. In other words, it is optional.

<item repeat="0-">....</item> allows the contents of the <item> element to occur 0 or more times. In other words, it is optional, but may also appear more than once.

<item repeat="1-">....</item> requires the contents of the <item> element to occur 1 or more times. In other words, it is not optional, but may also appear more than once.

<item repeat="4-6">....</item> allows the contents of the <item> element to occur 4, 5, or 6 times. It must appear at least 4 times.

<item repeat="10">....</item> requires the contents of the <item> element to occur exactly 10 times (think digits and phone numbers, for example).

Here is an example adapted from the W3C’s SRGS 1.0 specification:

<item repeat="0-1"> 
   <item repeat="0-1"> very </item>
   big 
</item> 
pizza
<item repeat="0-">
   <item repeat="0-1">
      <one-of>
         <item>with</item>
         <item>and</item>
      </one-of>
   </item>
<one-of>
<item>cheese</item>
<item>pepperoni</item>
</one-of>
</item>

Matches to this grammar can include:

  • pizza
  • big pizza
  • very big pizza
  • pizza with cheese
  • big pizza with pepperoni
  • very big pizza with cheese and pepperoni
  • pizza cheese pepperoni

Using more than one rule

Both inline grammars (embedded in the VoiceXML application) and stand-alone grammars (contained in an external file) can include more than one rule. One rule in a grammar is the root rule. Other rules are called "subrules" or "subgrammars".

The <ruleref> element is used within one rule to reference another rule.

Subrules

A subrule (or subgrammar) within a parent inline grammar is defined as another <rule>, in addition to the required root rule. For example:

<grammar mode="voice" xml:lang="en-US" 
         tag-format="semantics/1.0"
         version="1.0" root="topRule">

   <rule id="topRule">
      ........
   </rule>

   <rule id="subRule1">
      ........
   </rule>

   <rule id="subRule2">
     ........
   </rule>

</grammar>

Using the <ruleref> element

The <ruleref> element is used as a reference to a subgrammar or stand-alone external grammar file. It can appear anywhere a word or DTMF tone can appear.

Referencing a subrule

Here is an example. A credit card company is on the lookout for a scam artist named Margo Smith who uses a variety of known aliases, all of which use her real initials. Her known first and last names are listed in the #firstName and #lastName rules shown below.

<grammar mode="voice" xml:lang="en-US" 
         tag-format="semantics/1.0"
         version="1.0" root="mostWanted">
<rule id="mostWanted" scope="public">
   <ruleref uri="#firstName"/>
   <ruleref uri="#lastName"/>
</rule>
<rule id="firstName">
    <one-of>
      <item>margo</item>
      <item>mary</item>
      <item>marilyn</item>
      <item>maryann</item>
      <item>melissa</item>
      <item>meredith</item>
      <item>marge</item>
    </one-of>
</rule>
<rule id="lastName">
   <one-of>
      <item>smith</item>
      <item>sullivan</item>
      <item>swift</item>
      <item>swenson</item>
      <item>sanders</item>
      <item>small</item>
      <item>sanchez</item>
      <item>story</item>
      <item>smathers</item>
   </one-of>
</rule>
</grammar>

The speech recognition engine finds a match between the speaker's utterance and this grammar only if one of the list of first names and then one of the list of last names are both spoken in sequence.

Referencing an external grammar

Suppose you have an application where you ask for the speaker’s city and you have a stand-alone external grammar file with several thousand cities (http://www.ourgrammars.com/usCities.grxml). You can reference this stand-alone external file with the <ruleref> element, like this:

<field name="cityVar">
 <prompt>what is your city</prompt>
 <grammar   mode="voice" xml:lang="en-US" 
            tag-format="semantics/1.0"
            version="1.0" root="city">
   <rule id="city">
      <item>i am going to</item>
      <ruleref uri="http://www.ourgrammars.com/usCities.grxml"/>
   </rule>
 </grammar>
</field>

The speech recognition engine finds a match between the speaker's utterance and this grammar only if the words "i am going to," followed by a valid city name in the usCities grammar, are spoken in sequence.

Referencing a subrule in an external grammar

When the <ruleref> element is used to reference external grammars, for example, <ruleref uri="http://www.ourgrammars.com/usCities.grxml"/>, the reference is to the root rule in the grammar. If the external grammar contains subrules, you must reference them individually with the fragment reference (#). Suppose that the usCities.grxml grammar contains a subgrammar named "usTowns." It would be referenced as:

<ruleref uri="http://www.ourgrammars.com/usCities.grxml#usTowns"/>

Note

An external grammar does not have to have a root rule. If it does not, your VoiceXML application must reference every one of the rules in the grammar with the fragment reference.

Special rules

Three rule names are defined to have special interpretation and processing by a speech recognizer. A grammar must not redefine these rule names.

  1. NULL defines a rule that is automatically matched even if the speaker does not say anything: <ruleref special="NULL"/>
  2. VOID defines a rule that can never be spoken. Inserting VOID into a sequence makes that sequence unspeakable: <ruleref special="VOID"/>
  3. GARBAGE defines a rule that matches any speech: <ruleref special="GARBAGE"/>

GARBAGE is a powerful rule (use it with care!) that permits your grammar to ignore extraneous verbiage or utterances that do not conform to expected responses. If your grammar is written with the expectation that the caller is going to say one thing but the caller actually says something else, then a grammar match does not occur. For example,

<grammar mode="voice" xml:lang="en-US"
         tag-format="semantics/1.0"
         version="1.0" root="flight">
   <rule id="flight">
      i want to fly to
      <one-of>
         <item>boston</item>
         <item>chicago</item>
         <item>miami</item>
      </one-of>
   </rule>
</grammar>

If the caller says "I want to fly to Chicago" a match occurs. But if the caller says "Gimme a ticket to Chicago" no match occurs.

Using the special GARBAGE rule can make the grammar work no matter how the caller prefaces the request:

<grammar mode="voice" xml:lang="en-US"
         tag-format="semantics/1.0"
         version="1.0" root="flight">
     <rule id="flight">
      <ruleref special="GARBAGE"/>
      <one-of>
         <item>boston</item>
         <item>chicago</item>
         <item>miami</item>
      </one-of>
    </rule>
</grammar>

The speech recognition engine finds a match between the speaker's utterance and this grammar if anything, followed by one of the three city names, is spoken.