Building a Simple Recursive Descent Parser (Completed Simple Parser)

In this post, I complete the recursive descent parser that will parse the simple grammar that I presented previously in this series.  In this post, I’ll examine in detail the NospaceExpression class, which needs to implement operator precedence.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCNote:  I have to apologize for the delay in writing this post.  First summer vacation took some time, and then I became involved in a project of writing two papers – one that summarizes SharePoint application development, and another that summarizes Office Client application development.  I’ve been literally obsessing about every possible way that you as a developer can extend SharePoint Server 2010, and every way that you can extend the Office 2010 client applications.  The SharePoint paper is nearing publication.  The Office Client paper will be along in a bit.  While obsessing, it has been difficult for me to mentally switch tracks back to this series.  But here we go again…

This post is one in a series on using LINQ to write a recursive-descent parser for SpreadsheetML formulas.  You can find the complete list of posts here.

The NospaceExpression Class

Here is the entire implementation of the NospaceExpression class.  I explained all of the code that has a gray background in the previous post, Building a Simple Recursive Descent Parser (continued).

public static NospaceExpression Produce(IEnumerable<Symbol> symbols)
{
// nospace-expression = open-parenthesis expression close-parenthesis
// / numerical-constant
// / prefix-operator expression
// / expression infix-operator expression

if (!symbols.Any())
return null;

if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
{
Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
if (e != null)
return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
}

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
if (t is OpenParenthesis)
return d + 1;
if (t is CloseParenthesis)
return d - 1;
return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
Symbol = s,
Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
SymbolWithIndex = v1,
Depth = v2,
});
var operatorList = z2
.Where(x => x.Depth == 0 &&
x.SymbolWithIndex.Index != 0 &&
InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
.ToList();
if (operatorList.Any())
{
int minPrecedence = operatorList
.Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
var op = operatorList
.Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
if (op != null)
{
var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
Expression e1 = Expression.Produce(expressionTokenList1);
if (e1 == null)
throw new ParserException("Invalid expression");
var expressionTokenList2 = symbols
.SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
Expression e2 = Expression.Produce(expressionTokenList2);
if (e2 == null)
throw new ParserException("Invalid expression");
InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
return new NospaceExpression(e1, io, e2);
}
}

NumericalConstant n = NumericalConstant.Produce(symbols);
if (n != null)
return new NospaceExpression(n);

PrefixOperator p = PrefixOperator.Produce(symbols.FirstOrDefault());
if (p != null)
{
Expression e = Expression.Produce(symbols.Skip(1));
if (e != null)
return new NospaceExpression(p, e);
}

return null;
}

Let’s work through the new code.  Let’s consider how the new code behaves when parsing the following formula:

"1+((2+3)*4)^5"

When determining if an expression has an infix operator, we specifically need to ignore operators that are in parentheses.  Those operators will have their chance to be recognized when NospaceExpression.Produce sees that an expression starts with an open parenthesis and ends with a close parenthesis, as processed by this code:

if (!symbols.Any())
return null;

if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
{
Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
if (e != null)
return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
}

So we need to ignore operators that are inside parentheses.  The easiest way to determine if an operator is inside of parentheses is to use the Rollup extension method, as follows:

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
if (t is OpenParenthesis)
return d + 1;
if (t is CloseParenthesis)
return d - 1;
return d;
});

This use of the Rollup extension method returns a new collection of integers that has the same number of elements in it as the source collection.  The integer is the depth of parentheses at that point.  In the following snippet, I add a bit of code to print the items in the collection, and then exit:

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
if (t is OpenParenthesis)
return d + 1;
if (t is CloseParenthesis)
return d - 1;
return d;
});
foreach (var item in z)
Console.WriteLine(item);
Environment.Exit(0);

When I run this example for the test formula "1+((2+3)*4)^5", I see:
0
0
1
2
2
2
2
1
1
1
0
0
0

When looking for an infix operator, I can ignore all operators where the depth is not zero.

Later on, I’m going to need the index of the symbol of the infix operator that we find.  The next projection uses symbols as its source, and projects an anonymous type that includes the symbol and its index:

var symbolsWithIndex = symbols.Select((s, i) => new
{
Symbol = s,
Index = i,
});
foreach (var item in symbolsWithIndex)
Console.WriteLine(item);
Environment.Exit(0);

When I run this for the test formula, I see:

{ Symbol = 1, Index = 0 }
{ Symbol = +, Index = 1 }
{ Symbol = (, Index = 2 }
{ Symbol = (, Index = 3 }
{ Symbol = 2, Index = 4 }
{ Symbol = +, Index = 5 }
{ Symbol = 3, Index = 6 }
{ Symbol = ), Index = 7 }
{ Symbol = *, Index = 8 }
{ Symbol = 4, Index = 9 }
{ Symbol = ), Index = 10 }
{ Symbol = ^, Index = 11 }
{ Symbol = 5, Index = 12 }

And next, use the Zip extension method to zip together the last two queries to create a list of items that contains:

·         The symbol

·         Its index

·         Its parenthesis depth

var z = symbols.Rollup(0, (t, d) =>
{
if (t is OpenParenthesis)
return d + 1;
if (t is CloseParenthesis)
return d - 1;
return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
Symbol = s,
Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
SymbolWithIndex = v1,
Depth = v2,
});
foreach (var item in z2)
Console.WriteLine(item);
Environment.Exit(0);

This produces:

{ SymbolWithIndex = { Symbol = 1, Index = 0 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = +, Index = 1 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = (, Index = 2 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = (, Index = 3 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = 2, Index = 4 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = +, Index = 5 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = 3, Index = 6 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = ), Index = 7 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = *, Index = 8 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = 4, Index = 9 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = ), Index = 10 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = ^, Index = 11 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = 5, Index = 12 }, Depth = 0 }

Next, we can look for any infix operators where the Depth == 0.  In addition, we filter out operators that are found in the very first position, because that will not be an infix operator, but instead a prefix operator.

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
if (t is OpenParenthesis)
return d + 1;
if (t is CloseParenthesis)
return d - 1;
return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
Symbol = s,
Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
SymbolWithIndex = v1,
Depth = v2,
});
var operatorList = z2
.Where(x => x.Depth == 0 &&
x.SymbolWithIndex.Index != 0 &&
InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
.ToList();
foreach (var item in operatorList)
Console.WriteLine(item);
Environment.Exit(0);

This produces a list of the two possible operators that are part of an expression that uses an infix operator:

{ SymbolWithIndex = { Symbol = +, Index = 1 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = ^, Index = 11 }, Depth = 0 }

Next, of the operators that could possibly be the infix operator for this expression, we find the minimum precedence of any operators.  When implementing a top-down parser, we process operators with the least precedence before processing operators of higher precedence.  When looking at the expression tree, operators with the least precedence are nearer to the root of the tree.  Operators with the highest precedence will be closer to the leaves, so that they will be evaluated before the operators with lower precedence.  If this isn’t clear, I think that it will be when you see the entire parse tree.  In this particular case, we need to find the + operator, not the ^ operator, as + has lower precedence.

To enable processing of precedence of operators, I defined a small dictionary of operators and their precedence:

public static Dictionary<string, int> OperatorPrecedence = new Dictionary<string, int>()
{
{ "^", 3 },
{ "*", 2 },
{ "/", 2 },
{ "+", 1 },
{ "-", 1 },
};

In the following listing, the highlighted line finds the minimum precedence of the operators that are in play.

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
if (t is OpenParenthesis)
return d + 1;
if (t is CloseParenthesis)
return d - 1;
return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
Symbol = s,
Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
SymbolWithIndex = v1,
Depth = v2,
});
var operatorList = z2
.Where(x => x.Depth == 0 &&
x.SymbolWithIndex.Index != 0 &&
InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
.ToList();
if (operatorList.Any())
{
int minPrecedence = operatorList
.Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
var op = operatorList
.Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
if (op != null)
{
var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
Expression e1 = Expression.Produce(expressionTokenList1);
if (e1 == null)
&nbs