Getting to the Crown Jewels

The crown jewels of the Database Edition product are the SQL parsers and script generator, these two pieces form the foundation of what the database project system does internally.


The parser, parses T-SQL code and turns it in to a script fragment, where the script generator takes a script fragment and turns it in to T-SQL code. This way we can roundtrip code between the two components. The script generator has the ability to format code on the way out, however there is no mode to preserve formatting or whitespace by the script generator at this point in time.

Per CTP16 of the GDR release the parser and script generator are public classes, so if you have any need to parse some SQL code, here is a solution.

The parser and script generator live in two assemblies. The Microsoft.Data.Schema.ScriptDom contains provider agnostic classes and the Microsoft.Data.Schema.ScriptDom.Sql assembly contain classes for the parser and script generator that are SQL Server specific.

Let's look at a very simple example on how to use the two components. The following screenshot shows the end result. A simple WinForms application that takes in a piece of T-SQL in this case the create statement of the authors table from the pubs database. The second text box shows the formatted result coming out of the script generator.


Lets digest what was needed to achieve this. I will not go in to the detail of the WinForms code since it is trivial and not that interesting, you can look at the sample code if you need to for that. I am going to focus on the usage of the parser and the script generator code.


The first step is to add the references to the required assemblies:


We first need to add two assembly references two:

  • Microsoft.Data.Schema.ScriptDom.dll
  • Microsoft.Data.Schema.ScriptDom.Sql.dll

Then we add two variables to the WinForms class to hold an instance of the parser and the script generator.

    1: private TSql100Parser _parser;
    2: private Sql100ScriptGenerator _scriptGen;

As you can see the parser and script generator are version specific implementations, there exists a TSql80Parser, TSql90Parser and TSql100Parser, the same hold true for the Sql<>ScriptGenerator classes. Normally you would want to instantiate the classes through a factory pattern, to abstract the version away, but in this simple example we will instantiate the versioned instances of the classes directly.  This happens in the form Load method.

    1: private void Form1_Load(object sender, EventArgs e)
    2: {
    3:     bool fQuotedIdenfifiers = false;
    4:     _parser = new TSql100Parser(fQuotedIdenfifiers);
    6:     SqlScriptGeneratorOptions options = new SqlScriptGeneratorOptions();
    7:     options.SqlVersion = SqlVersion.Sql100;
    8:     options.KeywordCasing = KeywordCasing.UpperCase;
   10:     _scriptGen = new Sql100ScriptGenerator(options);
   11: }

The only thing the parser needs to know is how to interpret string inside double quotes, are they strings or object identifiers? This is indicated when the parser class is constructed.

The script generator has many more options and takes and SqlScriptGeneratorOptions class instance as the constructor argument. The options class lets you define things like the version of the language and how to case keywords for examples. You will also find a set of indention properties that you can use to change the formatting results.

Now we are at the point we can do the hard work. When the user clicks the button the Click method will parse the SQL and script it back out to the second text box if there are no errors parsing the input code.

Lets step through the Click method:

    1: private void button1_Click(object sender, EventArgs e)
    2: {
    3:     textBox2.Text = String.Empty;
    4:     textBox3.Text = String.Empty;
    6:     string inputScript = textBox1.Text;
    7:     IScriptFragment fragment;
    8:     IList<ParseError> errors;
    9:     using (StringReader sr = new StringReader(inputScript))
   10:     {
   11:         fragment = _parser.Parse(sr, out errors);
   12:     }
   14:     if (errors != null && errors.Count > 0)
   15:     {
   16:         StringBuilder sb = new StringBuilder();
   17:         foreach (var error in errors)
   18:         {
   19:             sb.AppendLine(error.Message);
   20:             sb.AppendLine("offset " + error.Offset.ToString());
   21:         }
   22:         textBox3.Text = sb.ToString();
   23:     }
   24:     else
   25:     {
   26:         String script;
   27:         _scriptGen.GenerateScript(fragment, out script);
   28:         textBox2.Text = script;
   29:     }
   30: }

We first define to variables to hold the output of the parse operation, the IScriptFragment will hold the script fragment, this is the most important output of the parser, and it will also output an IList of ParseError objects when the parse operations ran in to problems.

Second we need to make the input to the parse available as a StreamReader, so we are converting the input string in to a StringReader class. Line 11 shows the actual call that parses the input stream and outputs the fragments and the errors.

Then we check if any errors occurred during the parse of the input. If so we iterate through the list of error objects and print out the message and offset information.

When there are no errors, we take the resulting script fragment and feed it back to the script generator. Line 27 shows the actual call. The output will be a formatted script.


You can download the sample code from the DataDude SkyDrive: 


Neat he? I hope this is useful!