The Microsoft code name "M" Modeling Language Specification - Productions

November 2009

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see theModel Citizen blog.]

 [This documentation targets the Microsoft SQL Server Modeling CTP (November 2009) and is subject to change in future releases. Blank topics are included as placeholders.]

Sections:
1: Introduction to "M"
2: Lexical Structure
3: Text Pattern Expressions
4: Productions
5: Rules
6: Languages
7: Types
8: Computed and Stored Values
9: Expressions
10: Module
11: Attributes
12: Catalog
13: Standard Library
14: Glossary

4 Productions

A production is a pattern and an optional constructor. Each production is a scope. The pattern may establish variable bindings which can be referenced in the constructor. A production can be qualified with a precedence that is used to resolve a tie if two productions match the same text (See §4.4.1).

syntax ProductionDeclaration

    = ProductionPrecedence? PatternDeclaration Constructor?;

syntax Constructor

    = "=>" TermConstructor;

syntax ProductionPrecedence

    = "precedence" IntegerLiteral  ":";

4.1 Pattern Declaration

A pattern declaration is a sequence of term declarations or the built-in pattern empty which matches "". 

syntax PatternDeclaration 

    = "empty"

    | TermDeclarations?;

syntax TermDeclarations

    = TermDeclaration

    | TermDeclarations TermDeclaration;

4.2 Term Declaration

A term declaration consists of a pattern expression with an optional variable binding, precedence and attributes. The built-in term error is used for error recovery and discussed in §4.2.1.

syntax TermDeclaration

    = "error"

    | Attributes? TermPrecedence? VariableBinding? TextPatternExpression;

syntax VariableBinding

    = Name ":";

syntax TermPrecedence

    = "left" "(" IntegerLiteral ")"

    | "right" "(" IntegerLiteral ")";

A variable associates a name with the output from a term which can be used in the constructor. Precedence is discussed in §4.4.2.

The error term is used to facilitate error recovery. This is described in §4.2.1.

4.2.1 Error

The error production enables error recover. Consider the following example:

module HelloWorld {

    language HelloWorld {

        syntax Main

          = HelloList;

        token Hello

          = "Hello";

        checkpoint syntax HelloList

          = Hello

          | HelloList "," Hello

          | HelloList "," error;

    }

}

The language recognizes the text "Hello,Hello,Hello" as expected and produces the following default output:

Main[

  HelloList[

    HelloList[

      HelloList[

        Hello

      ],

      ,,

      Hello

    ],

    ,,

    Hello

  ]

]

The text "Hello,hello,Hello" is not in the language because the second "h" is not capitalized (and case sensitivity is true). However, rather than stop at "h", the language processor matches "h" to the error token, then matches "e" to the error token, etc. until it reaches the comma. At this point the text conforms to the language and normal processing can continue. The language process reports the position of the errors and produces the following output:

Main[

  HelloList[

    HelloList[

      HelloList[

        Hello

      ],

      error["hello"],

    ],

    ,,

    Hello

  ]

]

Hello occurs twice instead of three times as above and the text the error token matched is returned as error["hello"].

4.3 Constructors

A term constructor is the syntax for defining the output of a production. A node in a term constructor can be:

  • An atom consisting of
    • A literal
    • A reference to another term
    • An operation on a reference
  • A ordered collection of successors with an optional label
  • An unordered collection of successors with an optional label

The following grammar mirrors this structure.

syntax TermConstructor

    = TopLevelNode;

syntax Node

    = Atom

    | OrderedTerm

    | UnorderedTerm;

syntax TopLevelNode

    = TopLevelAtom

    | OrderedTerm

    | UnorderedTerm;

syntax Nodes

    = Node

    | Nodes "," Node;

syntax OrderedTerm

    = Label? "[" Nodes? "]";

syntax UnorderedTerm

    = Label? "{" Nodes? "}";

syntax Label

    = Identifier

    | "id" "(" Atom ")";

syntax Atom

    = TopLevelAtom

    | "valuesof" "(" VariableReference ")";

syntax TopLevelAtom

    = TextLiteral

    | DecimalLiteral

    | LogicalLiteral

    | IntegerLiteral

    | NullLiteral

    | VariableReference

    | "labelof" "(" VariableReference ")";

syntax VariableReference

    = Identifier;

Each production defines a scope. The variables referenced in a constructor must be defined within the same production's pattern. Variables defined in other productions in the same rule cannot be referenced. The same variable name can be used across alternatives in the same rule.

Consider three alternatives for encoding the output of the same production. First, the default constructor (See §4.3.2):

module Expression {

    language Expression {

        token Digits = ("0".."9")+;

        syntax Main = E;

        syntax E

          = Digits

          | E "+" E ;

    }

}

Processing the text "1+2" yields:

Main[E[E[1], +, E[2]]]

This output reflects the structure of the grammar and may not be the most useful form for further processing. The second alternative cleans the output up considerably:

module Expression {

    language Expression {

        token Digits = ("0".."9")+;

        syntax Main

          = e:E => e;

        syntax E

          = d:Digits => d

          | l:E "+" r:E => Add[l,r] ;

    }

}

Processing the text "1+2" with this language yields:

Add[1, 2]

This grammar uses three common patterns.

  • Productions with a single term are passed through. This is done for the single production in Main and the first production in E.
  • A label, Add, is used to designate the operator.
  • Position is used to distinguish the left and right operand.

The third alternative uses a record like structure to give the operands names:

module Expression {

    language Expression {

        token Digits = ("0".."9")+;

        syntax Main

          = e:E => e;

        syntax E

          = d:Digits => d

          | l:E "+" r:E => Add{Left{l},Right{r}} ;

    }

}

Processing the text "1+2" with this language yields:

Add{Left{1}, Right{2}}

Although somewhat more verbose than the prior alternative, this output does not rely on ordering and forces consumers to explicitly name Left or Right operands. Although either option works, this has proven to be more flexible and less error prone.

4.3.1 Constructor operators

Constructor operators allow a constructor to use a variable reference as a label, extract the successors of a variable reference or extract the label of a variable reference.

Consider generalizing the example above to support multiple operators. This could be done by adding a new production for each operator -, *, /, ^. Alternatively a single rule can be established to match these operators and the output of that rule can be used as a label using id:

module Expression {

    language Expression {

        token Digits = ("0".."9")+;

        syntax Main

          = e:E => e;

        syntax Op

          = "+" => "Add"

          | "-" => "Subtract"

          | "*" => "Multiply"

          | "/" => "Divide" ;

        syntax E

          = d:Digits => d

          | l:E o:Op r:E => id(o){Left[l],Right[r]} ;

    }

}

Processing the text "1+2" with this language yields the same result as above. Processing

"1 / 2" yields:

Divide{Left{1}, Right{2}}

This language illustrates the id operator. See §4.4.2 for a more realistic expression grammar.

The valuesof operator extract the successors of a variable reference. It is used to flatten nested output structures. Consider the language:

module Digits {

    language Digits {

        syntax Main = DigitList ;

        token Digit = "0".."9";

        syntax DigitList

          = Digit

          | DigitList "," Digit ;

    }

}

Processing the text "1, 2, 3" with this language yields:

Main[

  DigitList[

    DigitList[

      DigitList[

        1

      ],

      ,,

      2

    ],

    ,,

    3

  ]

]

The following grammar uses valuesof and the pass through pattern above to simplify the output:

module Digits {

    language Digits {

        syntax Main = dl:DigitList => dl ;

        token Digit = "0".."9";

        syntax DigitList

          = d:Digit => DigitList[d]

          | dl:DigitList "," d:Digit => DigitList[valuesof(dl),d] ;

    }

}

Processing the text "1, 2, 3" with this language yields:

DigitList[1, 2, 3]

This output represents the same information more concisely.

4.3.2 Default Constructor

If a constructor is not defined for a production the language process defines a default constructor. For a given production, the default projection is formed as follows:

(1)           The label for the result is the name of the production’s rule.

(2)          The successors of the result are an ordered sequence constructed from each term in the pattern.

(3)          * and ? create an unlabeled sequence with the elements.

(4)          ( ) results in an anonymous definition. 

a.            If it contains constructors ( a:A => a), then the output is the output of the constructor. 

b.            If there are no constructors, then the default rule applied on the anonymous definition and the output is enclosed in square brackets  [ A’s result ]

(5)          Token rules do not permit a constructor to be specified and output text values.

(6)          Interleave rules do not permit a constructor to be specified and do not produce output.

Consider the following language:

module ThreeDigits {

    language ThreeDigits {

        token Digit = "0".."9";

        syntax Main

          = Digit Digit Digit ;

    }

}

Given the text "123" the default output of the language processor follows:

Main[

  1,

  2,

  3

]

4.4 Precedence

The M language processor is tolerant of such ambiguity as it is recognizing subsequences of text. However, it is an error to produce more than one output for an entire text value. Precedence qualifiers on productions or terms determine which of the several outputs should be returned.

4.4.1 Production Precedence

Consider the classic dangling else problem as represented in the following language:

module IfThenElse {

    language IfThenElse {

        syntax Main = S;

        syntax S

          = empty

          | "if" E "then" S

          | "if" E "then" S "else" S;

        syntax E = empty;

        interleave Whitespace = " ";

    }

}

Given the input "if then if then else", two different output are possible. Either the else binds to the first if-then:

if

then

    if

    then

else

Or it binds to the second if-then:

if

then

    if

    then

    else

The following language produces the output immediately above, binding the else to the second if-then.

module IfThenElse {

    language IfThenElse {

        syntax Main = S;

        syntax S

          = empty

          | precedence 2: "if" E "then" S

          | precedence 1: "if" E "then" S "else" S;

        syntax E = empty;

        interleave Whitespace = " ";

    }

}

Switching the precedence values produces the first output. 

4.4.2 Term Precedence

Consider a simple expression language which recognizes:

2 + 3 + 4

5 * 6 * 7

2 + 3 * 4

2 ^ 3 ^ 4

The result of these expressions can depend on the order in which the operators are reduced. 2 + 3 + 4 yields 9 whether 2 + 3 is evaluated first or 3 + 4 is evaluated first. Likewise, 5 * 6 * 7 yields 210 regardless of the order of evaluation.

However, this is not the case for 2 + 3 * 4. If 2 + 3 is evaluated first yielding 5, 5 * 4 yields 20. While if 3 * 4 is evaluated first yielding 12, 2 + 12 yields 14. This difference manifests itself in the output of the following grammar:

module Expression {

    language Expression {

        token Digits = ("0".."9")+;

        syntax Main = e:E => e;

        syntax E

          = d:Digits => d

          | "(" e:E ")" => e

          | l:E "^" r:E => Exp[l,r]

          | l:E "*" r:E => Mult[l,r]

          | l:E "+" r:E => Add[l,r];

        interleave Whitespace = " ";

    }

}

"2 + 3 * 4" can result in two outputs:

Mult[Add[2, 3], 4]

Add[2, Mult[3, 4]]

According to the rules we all learned in school, the result of this expression is 14 because multiplication is performed before addition. This is expressed in M by assigning "*" a higher precedence than "+". In this case the result of an expression changed with the order of evaluation of different operators.

The order of evaluation of a single operator can matter as well. Consider 2 ^ 3 ^ 4. This could result in either 8 ^ 4 or 2 ^ 81. In term of output, there are two possibilities:

Exp[Exp[2, 3], 4]

Exp[2, Exp[3, 4]]

In this case the issue is not which of several different operators to evaluate first but which in a sequence of operators to evaluate first, the leftmost or the right most. The rule in this case is less well established but most languages choose to evaluate the rightmost "^" first yielding 2 ^ 81 in this example.

The following grammar implements these rules using term precedence qualifiers. Term precedence qualifiers may only be applied to literals or references to token rules. 

module Expression {

    language Expression {

        token Digits = ("0".."9")+;

        syntax Main = E;

        syntax E

          = d:Digits => d

          | "(" e:E ")" => e

          | l:E right(3) "^" r:E => Exp[l,r]

          | l:E left(2) "*" r:E => Mult[l,r]

          | l:E left(1) "+" r:E => Add[l,r];

        interleave Whitespace = " ";

    }

}

"^" is qualified with right(3). right indicates that the rightmost in a sequence should be grouped together first. 3 is the highest precedence, so "^" will be grouped most strongly.