Invent your own language… using Oslo. Part I
Alsalam alikom wa ra7mat Allah wa barakatoh (Peace Upon You)
Part I: Create the Grammar
…What your users will write…
If you are writing enterprise software, you probably came across this problem many times when you want to give IT Admins the ability to customize your application through some sort of scripts. The typical solution used to be VBS on windows and now PowerShell.. they both offer great ability to expose your public APIs and let others consume them. However, there is no way to customize the syntax to your application needs.
If you are writing a software that does some reporting and want to let users decide what template should be used when sending to every customer… wouldn’t it be nice if you let them write something like this:
Send "D:\Reports\Templates\Regular.tmpl" to sm1@hotmail.com, sm2@hotmail.com
Send "D:\Reports\Templates\Special.tmpl" to sm3@hotmail.com
Or how about a language that describes mathematical equations such as this:
[[x, y] ; [1, 2] ; [2, x + 5]] * Func(2, 3, 5) – x2 + y * j ^ 3 + d - mul(3)
This may look similar to Matlab syntax… in fact it’s. The first part “[[x, y] ; [1, 2] ; [2, x + 5]] ” declares a 2x2 matrix. You can think of all sorts of symbols and operations that can be put into an equation… Latex and MathML [Wikipedia links] come into mind. Latex is not only for representing equations but it’s one feature of it. My point is, even if you want to reuse one of the existing languages out there in your application, you don’t have to rewrite a parser, buy a commercial package or get an OSS that might limit how you will be able to distribute your software. MGrammer (included in Oslo) is one elegant way to achieve this…
Or maybe you just don’t like the brackets () in C# and want to define a bracketless C# (VB maybe 8-) )…
There might be a debate around this and whether you will want each app to have it’s own Domain Specific Language (DSL) or is it better to have a common known scripting language (like VBS or PowerShell).. I won’t go into this debate actually here… but I would just say, “it depends”.
Let’s get to work,
First thing you need to do is to get Oslo installed [Microsoft Downloads Link]. It needs SQL server (Free Express Edition works fine) –I installed it without SQL Server, it gave me an error but it’s ok as long as you don’t do any stuff that requires this-…
Oslo comes with a toolchain. We will actually make use of only 2 tools out of these; IntelliPad.exe (GUI tool) and Mg.exe (CMD tool)
We will be using IntelliPad to write MGrammar (.mg file), test it on our examplr data and make sure the syntax tree is in the right format.
We will then use Mg tool to compile the MGrammar file into .mgx file which represents you compiled language parser.. you can then use it from any .NET app.
We will make a grammar that recognizes that first language above, let’s call it Reporter. We will get to know MGrammer as we move on…
1.
Open notepad (or notepad++ you geeky guys), type in this example and save as reportsData.bundle.
```
Send "D:\Reports\Templates\Regular.tmpl" to sm1@hotmail.com
Send
"D:\Reports\Templates\Special.tmpl" to sm2@hotmail.com, sm3@hotmail.com
```
This will work as our test data to make sure our grammar works fine.
Open IntelliPad:
and save that empty file into Reporter.mg (you have to put the extension yourself or else IntelliPad won’t recognize the file as an MGrammar one)
Now close IntelliPad, and double click on your file to open IntelliPad again in MGrammar Mode (it’s a bug if you ask me)
The container in MG is the module element (much like namespaces or packages)
inside a module you can have different elements (language, type, … etc) in MG we are interested in language element.
Now go ahead and type your first lines
module Basic.Languages
{
language Reporter
{
}
}
From MGrammar Mode menu, choose Tree Preview, this allows you to see how the data match as you start writing more code. It will ask you for an input file, go ahead and choose reportsData.bundle we created in step 1.
Here is how it should look like now:
Add this statement (syntax Main = any*; ) into language section, you will notice that the right tree got populated with a long list of characters.
This statement is the entry point (Main), here it tells Mg that the syntax you are expecting is “any*” which basically matches with any number of any type of characters.
We know that every statement we have starts with “Send” and has a “to” in the middle. So we can refine that any* matching with something more useful, like this:
syntax Main = "Send" any* "to" any*;
This instructs MG that you are expecting a word “Send” then some text then “to” then some text.
You will notice that the preview tree now has a little structure… “Send” node and some characters in side then a “to” node with the rest of the statement..
But there is an error saying that the second statement didn’t match anything.
This is because our main statement matches only one instance.
We can define it like this:
syntax Main = ("Send" any* "to" any*)+;
or we can be more descent and define another syntax, say SendCommand.. like this
syntax SendCommand = "Send" any* "to" any*; syntax Main = SendCommand+;
As you might have guessed, + means 1 or more, * means 0 or more (same as in Regular Expressions) and ? means 0 or 1 occurrence.
To make things a little bit more descent, we can define tokens for “Send” and “to”… Tokens can pretty much looked at as the basic building element for a statement or a matching pattern..
Here is our code after defining the tokens:
token SendToken = "Send"; token ToToken = "to"; syntax SendCommand = SendToken any* ToToken any*; syntax Main = SendCommand+;
We need to match the quoted string (file path) but we don’t really bother for now to validate whether it’s a valid file path or not (Left as an exercise to the reader)
So we will go ahead and define a more complex token for that,token AlphaNumeric = 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-'; token Path = (AlphaNumeric | ':' | '\\' | '.')+; token Email = (AlphaNumeric | '@' | '.')+; token QuotedPath = '"' Path '"'; syntax SendCommand = SendToken QuotedPath ToToken Email;
‘a’..’z’ defines range (as you might have guessed) other things are straight forward.
This is how the right preview tree should look like by now.
Things are getting to look better, right? maybe you started to get the feeling it’s just like regular expressions, in fact, it’s pretty much the same concept (String matching after all if you want the truth) but writing in MGrammar gives you a lot of other options when writing your rules than you have when matching with regEx Also you will almost write 0 Lines of code to get your abstract syntax tree (AST) built ;)
You will notice IntelliPad is complaining about the ‘,’ in the second example line. He’s right, we didn’t define how a list of emails may look like.
syntax ListOfEmails = Email | ListOfEmails "," Email; syntax SendCommand = SendToken QuotedPath ToToken ListOfEmails;
This is a recursive rule. It recognizes “a@b.com” and “a@b.com,c@d.com” but it doesn’t recognize “a@b.com,” in other words, the “,” must be between 2 emails… which is exactly what we want.
We are almost done, we now need to modify the resultant tree to make it look, well, better.
We need to fix the Send nodes so that we don’t actually put “Send” and “to” into the resultant tree. We will use projection, that’s we will define how do we want our rules to be projected (output) into the syntax tree…
syntax SendCommand = SendToken p:QuotedPath ToToken list:ListOfEmails => Send[p, list];
We named the match result of QuotedPath as p and ListOfEmails as list.. then the projection is Send[p, list]; would create a node called Send, with 2 children only.
Here is the output:
Main[ [ Send[ "\"D:\\Reports\\Templates\\Regular.tmpl\"", ListOfEmails[ "sm1@hotmail.com" ] ], Send[ "\"D:\\Reports\\Templates\\Special.tmpl\"", ListOfEmails[ ListOfEmails[ "sm2@hotmail.com" ], ",", "sm3@hotmail.com" ] ] ] ]
Now we need to remove that “\” before and after the path… they don’t look neat.
token QuotedPath = '"' p:Path '"' => Path[p]; syntax SendCommand = SendToken p:QuotedPath ToToken list:ListOfEmails => Send[valuesof(p), list];
First modification would make a node for every path and would remove the double quotes like this
Path[
“D:\…..”
]
valuesof(p) will extract the node contents and project it to the tree, in this case the contents is the path with singles double quotations (if you know what I mean).
You can go ahead and try to modify the Email rule so that it doesn’t project the “,” and projects all emails in one list rather than nested lists.
Here is how the tree looks like after all my modifications (with those left to you as exercise :))
Commands[ Send[ "D:\\Reports\\Templates\\Regular.tmpl", Emails[ "sm1@hotmail.com" ] ], Send[ "D:\\Reports\\Templates\\Special.tmpl", Emails[ "sm2@hotmail.com", "sm3@hotmail.com" ] ] ]
Now, to compile use .mg file, open cmd, browse to the folder where reporter.mg file is and type this:
"C:\Program Files\Microsoft Oslo SDK 1.0\Bin\mg.exe" reporter.mg
This will generate a reporter.mgx file in the current directory.
Now run:
"C:\Program Files\Microsoft Oslo SDK 1.0\Bin\mgx.exe” reportsData.bundle -r:reporter.mgx
This will generate the M structure (similar to AST with some key differences that we will explore in the next post) into a separate file. Viola!
Conclusion:
- We used IntelliPad to write and test our MGrammer.
- MGrammar is a descriptive language to define the grammar of any language.
- We need to define module and language sections in .mg file.
- syntax Main is the starting point, syntaxes can be recursive.
- tokens are the basic elements.
- We use projection rules on syntax and token to reshape the resultant tree.
- MGrammar is awesome!
Congrats, you reached your first Checkpoint!
Here is the full listing of Reporter.mg file.
module Basic.Languages
{
language Reporter
{
token SendToken = "Send";
token ToToken = "to";
token AlphaNumeric = 'a'..'z' | 'A'..'Z' | '0'..'9';
token Path = (AlphaNumeric | ':' | '\\' | '.')+;
token Email = (AlphaNumeric | '@' | '.')+;
token QuotedPath = '"' p:Path '"' => Path{p};
syntax ListOfEmails = e:Email => Emails[e]
| list:ListOfEmails "," e:Email
=> Emails[valuesof(list), e];
syntax SendCommand = SendToken p:QuotedPath ToToken
list:ListOfEmails
=> Send{valuesof(p), list};
syntax Main = s:SendCommand+ => Main{Commands[valuesof(s)]};
interleave whitespace = (" " | "\r" | "\n" | "\t")+;
}
}
Path I: Create the Grammar (What your users write). [This post]
Part II: Consume the abstract syntax tree (Do some action!).
Path III: Compile your language into MSIL.
References:
A good tutorial for MGrammar: https://msdn.microsoft.com/en-us/library/dd441702.aspx
You will also find some good documents in C:\Program Files\Microsoft Oslo SDK 1.0\Documents
2 in particular are interesting:
MGrammar Language Specification.docx
MGrammar in a Nutshell.docx
Have a nice time!
Creating a .NET language by Haytham Abuel-Futuh is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Based on a work at blogs.msdn.com.
Comments
Anonymous
May 14, 2009
PingBack from http://asp-net-hosting.simplynetdev.com/invent-your-own-language%e2%80%a6-using-oslo-part-i/Anonymous
May 15, 2009
Interesting! Aside from the topic, I really like their Intellipad .. :DAnonymous
May 15, 2009
:D Believe me, when it takes 600mb of ur RAM, you won't be that happy :D It's nice but seems it keeps unneeded references and hence leaks memory in some sense :DAnonymous
May 24, 2009
Alsalam alikom wa ra7mat Allah wa barakatoh (Peace Upon You) Part II: Consume the Abstract Syntax Tree