A close look at the Razor Parse Tree
This is a part of the series where we take a deep dive into the inner workings and extensibility of the Razor parser. In this post we are going to take a detailed look at the parse tree that is generated
To learn more about the workings of parser please see the post by Andrew
In Razor parser the document starts in markup mode(MarkupBlock). Depending on what comes next the parser switches between markup and code mode. This post takes you through the different kinds of markup/code switches that can happen. There is a tool that you can download which generates a parse tree based on Razor syntax input. The tool works for both C# and VB
So lets dive into the building blocks.
At a high level the parsed tree comprises of Blocks which signify what type of Block is the parser parsing. Blocks are the non leaf nodes of the parsed tree. Blocks can contain blocks which eventually terminate in Spans
Following is a diagram which shows the high level structure for the Razor Parse Tree
BLOCKS
Blocks can be of the following types
Block Type | Explanation | Example |
Markup | This is the type in which the Parser starts parsing the document. It comprises of Html markup text | Hello world |
Section | This is the type where the parser parses the @section construct | @section Foo{} |
Template | This is the type where the parser parses the inline templates being defined for the WebGrid helpers |
|
Statement |
The parser is parsing a Statement block | @{ int x=1; } |
Directive |
Parser is parsing the top level directives of the page | @using Microsoft.Web.Helpers |
Functions |
Parser is parsing the functions block in the file | @functions{} |
Expression |
Parser is parsing any expressions block | @System.DateTime.Now |
Helper |
Parser is parsing the definition of @helper construct | @helper HelperName(){} |
Comment | Parsing is paring the block commenting support for commenting markup and code | @* comments which have markup and code Markup: Hi Code: @{} *@ |
Blocks contain the following information
BlockType: One of the above kinds
SourceLocation: Location of the char in the file where the block started which is of the following representation (AbsoulteIndex: LineIndex: CharIndex :: Length of Block)
SPANS
Blocks are divided into the following Spans. Think of Spans as the leaf node in the ParseTree.
The Spans contain information around the position of the span(line, col) and the content being parsed. This is useful in the cases of error reporting, syntax highlighting in the editor
SpanKind | Explanation | Example |
Transition | This span signifies that the parser parsed the @ character. | @ in @{} |
MetaCode | This signifies the char which start and end a block | {} in @{} for Statement block () in @() for Expression |
Comment | All content in the Comment Block | foo in @* foo *@ is CommentSpan |
Code | This type has all the code under the Statement/Expression block | System.DateTime.Now in @System.DateTime.Now |
Markup | This type has the all the markup content in a markup block | <p></p> in “@{} <p></p>” |
I hope by now you would have a high level idea about the structure of the parse tree. At this point I have a sample input for razor file and I will walk you through the generated parse tree.
Spans contain the following information
SpanType: One of the above kinds
SourceLocation: Location of the char in the file where the block started which is of the following representation (AbsoulteIndex: LineIndex: CharIndex :: Length of Span)
Content: Content which is parsed as a Span
Sample Input for a Razor C# file
1 + 1 = @(1+1)
GeneratedParseTree
Markup Block at (0:0,0)::16
- Markup Span [V;Any] at (0:0,0)::8 - [1 + 1 = ] (Document)
- Expression Block at (8:0,8)::6
- Transition Span [V;None] at (8:0,8)::1 - [@]
- MetaCode Span [V;None] at (9:0,9)::1 - [(]
- Code Span [V;Any] at (10:0,10)::3 - [1+1] - [Terminator: <>]
- MetaCode Span [V;None] at (13:0,13)::1 - [)]
- Markup Span [V;Any] at (14:0,14)::2 - [\r\n] (Document)
As you know Razor parser starts parsing with MarkupBlock, so in this case the first block is the MarkupBlock. After creating the markup block the parser sees that the next char in markup so it creates a MarkupSpan and puts all the markup content in this span
When the parser sees @, it knows that the next characters have to do with code so it creates an ExpressionBlock. After creating the ExpressionBlock the parser parses the @ as a TransitionSpan which means that we have transitioned from Markup-Code. ExpressionBlock have the following signature @() so the parser parses the ( as MetaCode span. At this point the parser parsers the remaining characters as CodeSpan until it sees the terminator char ) which is parsed as MetaCodeSpan
After the CodeSpan, the ExpressionBlock does not anything else to be parsed and this the parser consumes the newline character as part of the MarkupBlock
ParseTreeViewer
If you found the above description about the ParseTree that gets generated for the Razor syntax, interesting then you should download this tool which lets generates the Parsed Tree for a given Razor syntax.
This tool can be used for debugging your application, though I would say it is an advanced use. If you think that the parser is not parsing the input as expected, then you can use this tool to see the parse tree that gets generated and figure out what is wrong.
Screenshot of the tool
FAQ
1. You need to have Asp.Net WebPages installed on the machine
2. If you select the “View In Browser” option then the tool generates a temp file “test.htm”
Hopefully this would help you understand the structure of the generated parse tree for razor syntax.