syntax

Article
05/26/2009

[This is prerelease documentation and is subject to change in future releases. Blank topics are included as placeholders.]

This keyword specifies a syntax rule for parsing input to a domain specific language (DSL).

A syntax rule must contain a name, and one or more productions. Each production can consist of multiple terms.

The rule can optionally specify rule parameters and attributes can be applied.

Each production can specify a constructor and a precedence.

Each term can also specify attributes, precedence, and variable binding.

This topic discusses the required components of a syntax rule. Sub-topics discuss the optional components of the rule, productions, and the terms in a production.

The first syntax section shows a basic form of the syntax keyword, showing only the required elements. The second syntax section shows all optional components.

 syntax  RuleName    =  Productions;

<Optional Attributes>  syntax  RuleName <Optional Parameters>  = Productions ;

Productions : 
    Production  OR 
    Productions | Production

Production :
    <Optional Precedence>  Pattern <Optional Constructor>

Pattern :
    "empty" OR
    Terms

Terms :
    Term OR
    Terms Term

Term :
    "error" OR
    <Optional attributes> <Optional Precedence> <Optional Variable Binding>  Text Pattern

Text Pattern :
    TextLiteral OR
    CharacterRange OR
    Character + Kleene Operator OR
    Rule Reference OR
    In-line rules OR
    "any"

Rule Name

RuleName is any valid “M” identifier.

Productions

A rule contains one or more productions, separated by the "or" (|) operator. Each production consists of a pattern and an optional constructor, which is used to shape the output of the rule if the default output is not desired. A production can be prefixed with a precedence, which is used to resolve ambiguity: if two productions match the same text, the one with the higher precedence is used. A pattern can specify variable bindings, which can be referenced in the constructor.

Text Pattern

A pattern is a sequence of terms, or the system-defined pattern empty, which matches the empty string ("").

Terms

A term can consist of one of the following:

A reference to another rule.
A text literal.
A range of characters.
An in-line rule, which is a rule with a range operator applied.
Characters with Kleene operators applied.
The literal any, which is a wildcard that matches any text value of length 1.

These terms can be combined into expressions using the difference, intersection, and inverse set operators.

Remarks

The top-most syntax rule must be named Main.

A rule commonly contains one or more productions and each production is made up of multiple terms. Very often one or more of the terms reference another rule, thus allowing the specification of a hierarchical tree structure to the language. The leaves of the tree often consist of text literals, character ranges, or token rules.

If an input text value conforms to more than one production in that rule, then the rule is ambiguous.

Example

The following code recognizes character strings of the form "Hello {, Hello}*", such as the string "Hello" followed by 0 or more occurrences of the string ",Hello".

Note there are several syntax rules with terms that are rule references. The HelloList rule is a common pattern used to express one or more occurrences of something.

Finally, note that the Hello rule, which is at the bottom of the syntax tree, is a token rule in most languages. In this sample, the language is small enough that it makes no difference.

module HelloWorld {
    language HelloWorld {
        syntax Main 
          = HelloList;
        syntax HelloList 
          = Hello
          | HelloList "," Hello
          ;
        syntax Hello 
          = "Hello";
    }
}

Fill out a survey about this topic for Microsoft.

Share via