July 2013

Volume 28 Number 7

Compilers - How Microsoft's Next-Gen Compiler Project Can Improve Your Code

By Jason Bock

I believe every developer wants to write good code. Nobody wants to create bug-ridden, unmaintainable systems that require endless hours to add features or fix problems. I’ve been on projects that felt like they were in a constant state of chaos, and they’re not fun. Long hours are lost in a code base that’s barely comprehensible due to inconsistent approaches. I like being on projects where layers are well-defined, unit tests are in abundance and build servers are constantly running to ensure everything works. Projects like that usually have a set of guidelines and standards in place that the developers follow.

I’ve seen teams put such guidelines in place. Maybe the developers are supposed to avoid calling certain methods in their code because they’ve been deemed problematic. Or maybe they want to make sure the code follows the same patterns in certain situations. For example, developers on projects may agree on standards like these:

  • No one should use local DateTime values. All DateTime values should be in Universal Time Coordinate (UTC).
  • The Parse method found on value types (such as int.Parse) should be avoided; int.TryParse should be used instead.
  • All of the entity classes created should support equality—that is, they should override Equals and GetHashCode and implement the == and != operators, and the IEquatable<T> interface.

I’m sure you’ve seen similar rules in a standards document. Consistency is a good thing, and if everyone follows the same practices, it becomes easier to maintain the code. The trick is to quickly express that knowledge to all the developers on the team in a reusable, effective way.

Code reviews are one way to find potential issues. It’s common for people with a fresh perspective on a given implementation to see issues that the original author is not aware of. Having another set of eyes review what you did can be beneficial, especially when the reviewer isn’t familiar with the work. However, it’s still easy to miss issues during development. Furthermore, code reviews are time-consuming—developers have to spend hours reviewing code and meeting with other developers to communicate the problems they find. I want a process that’s quicker. I want to know as soon as possible that I’ve done something wrong. Failing as fast as possible saves time and money in the long run.

There are tools in Visual Studio, such as Code Analysis, that can analyze your code and inform you of potential problems. Code Analysis has a number of predefined rules that can uncover cases where you haven’t disposed your object, or when you have unused method arguments. Unfortunately, Code Analysis doesn’t run its rules until compilation is complete, and that’s not soon enough! I want to know as soon as I’m typing that my new code has a mistake in it according to my standards. Failing as fast as I can is a good thing. Time (and therefore money) is saved, and I avoid committing code that can potentially lead to numerous problems in the future. To do that, I need to be able to codify my rules such that they’re executed as I type, and that’s where the Microsoft “Roslyn” CTP comes into play.

What’s Microsoft “Roslyn”?

One of the best tools .NET developers can use to analyze their code is the compiler. It knows how to parse code into tokens, and turn those tokens into something that’s meaningful based on their placement in the code. The compiler does this by emitting an assembly to disk as its output. There’s a lot of hard-won knowledge that’s gleaned in the compilation pipeline that you’d love to be able to use, but, alas, that’s not possible in the .NET world because the C# and Visual Basic compilers don’t provide an API for you to access. This changes with Roslyn. Roslyn is a set of compiler APIs that provides you with full access to every stage the compiler moves through. Figure 1 is a diagram of the different stages in the compiler process that are now available in Roslyn.

The Roslyn Compiler Pipeline
Figure 1 The Roslyn Compiler Pipeline

Even though Roslyn is still in CTP mode (I used the September 2012 version for this article), it’s worth taking the time to investigate the functionality available in its assemblies, and to learn what you can do with Roslyn. A good place to start is to look at its scripting facility. With Roslyn, C# and Visual Basic code are now scriptable. That is, there’s a scripting engine available in Roslyn into which you can input snippets of code. This is handled via the ScriptEngine class. Here’s a sample that illustrates how this engine can return the current DateTime value:

class Program
{
  static void Main(string[] args)
  {
    var engine = new ScriptEngine();
    engine.ImportNamespace("System");
    var session = engine.CreateSession();
    Console.Out.WriteLine(session.Execute<string>(
      "DateTime.Now.ToString();"));
  }
}

In this code, the engine is creating and importing the System namespace so Roslyn will be able to resolve what DateTime means. Once a session is created, all it takes is to call Execute, and then Roslyn will parse the given code. If it can parse it correctly, it will run it and return the result.

Making C# into a scripting language is a powerful concept. Even though Roslyn is still in CTP mode, people are creating amazing projects and frameworks using its bits, such as scriptcs (scriptcs.net). However, where I think Roslyn really shines is in letting you create Visual Studio extensions to warn you of issues while you write code. In the previous snippet, I used DateTime.Now. If I were on a project that enforced the first bullet point I made at the beginning of the article, I’d be in violation of that standard. I’ll explore how that rule can be enforced using Roslyn. But before I do that, I’ll cover the first stage of compilation: parsing code to get tokens.

Syntax Trees

When Roslyn parses a piece of code, it returns an immutable syntax tree. This tree contains everything about the given code, including trivia such as spaces and tabs. Even if the code has errors, it’ll still try as best it can to give you as much information as possible.

This is all well and good, but how do you figure out where in the tree the pertinent information is? Currently, the documentation on Roslyn is fairly light, which is understandable given that it’s still a CTP. You can use the Roslyn forums to post questions (bit.ly/16qNf7w), or use the #RoslynCTP tag in a Twitter post. There’s also a sample called SyntaxVisualizerExtension when you install the bits, which is an extension for Visual Studio. As you type code in the IDE, the visualizer automatically updates with the current version of the tree.

This tool is indispensable in figuring out what you’re looking for and how to navigate the tree. In the case of using .Now on the DateTime class, I figured out that I needed to find Member­AccessExpression (or, to be precise, a MemberAccessExpression­Syntax-based object), where the last IdentifierName value equals Now. Of course, that’s for the simple case where you’d type “var now = DateTime.Now;”—you could put “System.” in front of DateTime, or use “using DT = System.DateTime;”; furthermore, there may be a property in the system in a different class called Now. All of the cases must be processed correctly.

Finding and Solving Code Issues

Now that I know what to find, I need to create a Roslyn-based Visual Studio extension to hunt down DateTime.Now property usage. To do this, you simply select the Code Issue template under the Roslyn option in Visual Studio.

Once you do this, you’ll get a project that contains one class called CodeIssue­Pro­vider. This class implements the ICodeIssue­Provider interface, though you don’t have to implement each of its four members. In this case, only the members that work with SyntaxNode types are used; the others can throw NotImplementedException. You implement the SyntaxNodeTypes property by specifying which syntax node types you want to handle with the corresponding GetIssues method. As was mentioned in the previous example, MemberAccessExpressionSyntax types are the ones that matter. The following code snippet shows how you implement SyntaxNodeTypes:

public IEnumerable<Type> SyntaxNodeTypes
{
  get
  {
    return new[] { typeof(MemberAccessExpressionSyntax) };
  }
}

This is an optimization for Roslyn. By having you specify which types you care to examine in more detail, Roslyn doesn’t have to call the GetIssues method for each syntax type. If Roslyn didn’t have this filtering mechanism in place and called your code provider for every node in the tree, the performance would be appalling.

Now all that’s left is to implement Get­Issues such that it will only report use of the Now property. As I mentioned in the previous section, you only want to find cases where Now has been used on DateTime. When you’re using tokens, you don’t have a lot of information besides the text. How­ever, Roslyn provides what’s called a semantic model, which can provide a lot more information about the code under examination. The code in Figure 2 demonstrates how you can find DateTime.Now usages.

Figure 2 Finding DateTime.Now Usages

public IEnumerable<CodeIssue> GetIssues(
  IDocument document, CommonSyntaxNode node, 
  CancellationToken cancellationToken)
{
  var memberNode = node as MemberAccessExpressionSyntax;
  if (memberNode.OperatorToken.Kind == SyntaxKind.DotToken &&
    memberNode.Name.Identifier.ValueText == "Now")
  {
    var symbol = document.GetSemanticModel()
        .GetSymbolInfo(memberNode.Name).Symbol;
    if (symbol != null &&
      symbol.ContainingType.ToDisplayString() ==
        Values.ExpectedContainingDateTimeTypeDisplayString &&
      symbol.ContainingAssembly.ToDisplayString().Contains(
        Values.ExpectedContainingAssemblyDisplayString))
    {
      return new [] { new CodeIssue(CodeIssueKind.Error,
        memberNode.Name.Span,
        "Do not use DateTime.Now",
        new ChangeNowToUtcNowCodeAction(document, memberNode))};
    }
  }
  return null;
}

You’ll notice the cancellationToken argument isn’t used, nor is it used anywhere in the sample project that accompanies this article. This is a deliberate choice, because putting code into the sample that constantly checks the token to see if the processing should stop can be distracting. But if you’re going to create Roslyn-based extensions that are production-ready, you should make sure you check the token often and stop if the token is in the canceled state.

Once you’ve determined that your member access expression is trying to get a property called Now, you can get symbol information for that token. You do this by getting the semantic model for the tree, and then you get a reference to an ISymbol-based object via the Symbol property. Then, all you have to do is get the containing type and see if its name is System.DateTime and if its containing assembly name includes mscorlib. If that’s the case, that’s the issue you’re looking for, and you can flag it as an error by returning a CodeIssue object.

This is good progress so far, because you’ll see a red squiggly error line underneath the Now text in the IDE. But it doesn’t go far enough. It’s nice when the compiler tells you your code is missing a semicolon or a curly brace. Getting error information is better than nothing at all, and with simple errors it’s usually pretty easy to fix them based on the error message. However, wouldn’t it be nice if tools could just figure out errors all by themselves? I like being told when I’m wrong—and I’m much happier when the error message gives me detailed information explaining how I can fix the issue. And if that information could be automated such that a tool could resolve the issues for me, that’s less time I have to spend on the problem. The more time saved, the better.

That’s why you see in the previous code snippet a reference to a class called ChangeNowToUtcNowCodeAction. This class implements the ICodeAction interface, and its job is to change Now to UtcNow. The main method you have to implement is called GetEdit. In this case, the Name token in the MemberAccessExpressionSyntax object needs to be changed to a new token. As the following code shows, it’s pretty easy to make this replacement:

public CodeActionEdit GetEdit(CancellationToken cancellationToken)
{
  var nameNode = this.nowNode.Name;
  var utcNowNode =
    Syntax.IdentifierName("UtcNow");
  var rootNode = this.document.
    GetSyntaxRoot(cancellationToken);
  var newRootNode =
    rootNode.ReplaceNode(nameNode, utcNowNode);
  return new CodeActionEdit(
    document.UpdateSyntaxRoot(newRootNode));
}

All you need to do is create a new identifier with the UtcNow text, and replace the Now token with this new identifier via ReplaceNode. Remember that syntax trees are immutable, so you don’t change the current document tree. You create a new tree, and return that tree from the method call.

With all of this code in place, you can test it out in Visual Studio by simply pressing F5. This launches a new instance of Visual Studio with the extension automatically installed.

Analyzing DateTime Constructors

This is a good start, but there are more cases that have to be handled. The DateTime class has a number of constructors defined that can cause issues. There are two cases in particular to be aware of:

  1. The constructor may not take a DateTimeKind enumeration type as one of its parameters, which means the resulting DateTime will be in the Unspecified state.
  2. The constructor may take a DateTimeKind value with one of its parameters, which means you may specify an enumeration value other than Utc.

You can write code to find both conditions. However, I’ll only create a code action for the second one.

Figure 3 lists the code for the GetIssues method in the ICodeIssue-based class that will find bad DateTime constructor calls.

Figure 3 Finding Bad DateTime Constructor Calls

public IEnumerable<CodeIssue> GetIssues(
  IDocument document, CommonSyntaxNode node, 
  CancellationToken cancellationToken)
{
  var creationNode = node as ObjectCreationExpressionSyntax;
  var creationNameNode = creationNode.Type as IdentifierNameSyntax;
  if (creationNameNode != null && 
    creationNameNode.Identifier.ValueText == "DateTime")
  {
    var model = document.GetSemanticModel();
    var creationSymbol = model.GetSymbolInfo(creationNode).Symbol;
    if (creationSymbol != null &&
      creationSymbol.ContainingType.ToDisplayString() ==
        Values.ExpectedContainingDateTimeTypeDisplayString &&
      creationSymbol.ContainingAssembly.ToDisplayString().Contains(
        Values.ExpectedContainingAssemblyDisplayString))
    {
      var argument = FindingNewDateTimeCodeIssueProvider
        .GetInvalidArgument(creationNode, model);
      if (argument != null)
      {
        if (argument.Item2.Name == "Local" ||
          argument.Item2.Name == "Unspecified")
        {
          return new [] { new CodeIssue(CodeIssueKind.Error,
            argument.Item1.Span,
            "Do not use DateTimeKind.Local or DateTimeKind.Unspecified",
            new ChangeDateTimeKindToUtcCodeAction(document, 
              argument.Item1)) };
        }
      }
      else
      {
        return new [] { new CodeIssue(CodeIssueKind.Error,
          creationNode.Span,
          "You must use a DateTime constuctor that takes a DateTimeKind") };
      }
    }
  }
  return null;
}

It’s very similar to the other issue. Once you know the constructor comes from a DateTime, you need to evaluate the arguments. (I’ll explain what GetInvalidArgument does in a moment.) If you find an argument of the DateTimeKind type and it doesn’t specify Utc, you have a problem. Otherwise, you know you’re using a constructor that won’t have the DateTime in Utc, so that’s another issue to report. Figure 4shows what GetInvalidArgument looks like.

Figure 4 The GetInvalidArgument Method

private static Tuple<ArgumentSyntax, ISymbol> GetInvalidArgument(
  ObjectCreationExpressionSyntax creationToken, ISemanticModel model)
{
  foreach (var argument in creationToken.ArgumentList.Arguments)
  {
    if (argument.Expression is MemberAccessExpressionSyntax)
    {
      var argumentSymbolNode = model
        .GetSymbolInfo(argument.Expression).Symbol;
      if (argumentSymbolNode.ContainingType.ToDisplayString() ==
        Values.ExpectedContainingDateTimeKindTypeDisplayString)
      {
        return new Tuple<ArgumentSyntax,ISymbol>(argument, 
            argumentSymbolNode);
      }
    }
  }
  return null;
}

This search is very similar to the others. If the argument type is DateTimeKind, you know you have a potentially invalid argument value. To fix the argument, the code is virtually identical to the first code action you saw, so I won’t repeat it here. Now, if other developers try to get around the DateTime.Now restriction, you can catch them in the act, and correct the constructor calls, too!

In the Future

It’s wonderful to think about all the tools that will be created with Roslyn, but work still needs to be done. One of the biggest frustrations I think you’ll have with Roslyn right now is the lack of documentation. There are good samples online and in the installation bits, but Roslyn is a large API set and it can be confusing finding out exactly where to start and what to use to accomplish a particular task. It’s not uncommon to have to dig around for a while to figure out the right calls to use. The encouraging aspect is that I’m usually able to do something in Roslyn that seems fairly complex at first but ends up being fewer than 100 or 200 lines of code.

I believe that as Roslyn gets closer to release, everything surrounding it will improve. And I’m also convinced that Roslyn has the potential to underpin many frameworks and tools in the .NET ecosystem. I don’t see every .NET developer using the Roslyn APIs on a day-to-day basis directly, but you’ll probably end up using bits that use Roslyn at some level. This is why I’m encouraging you to dive into Roslyn and see how things work. Being able to codify idioms into reusable rules that every developer on a team can take advantage of helps everyone quickly produce better code.


Jason Bock is a practice lead at Magenic (magenic.com) and recently coauthored “Metaprogramming in .NET” (Manning Publications, 2013). Reach him at jasonb@magenic.com.

THANKS to the following technical expert for reviewing this article: Kevin Pilch-Bisson (Microsoft), Dustin Campbell, Jason Malinowski (Microsoft), , Kirill Osenkov (Microsoft)