Bugslayer
Three Vital FXCop Rules
John Robbins
Code download available at:Bugslayer0409.exe(171 KB)
Contents
Introducing New Bugslayer Rules
Avoid Boxing and Unboxing Rule
Exception Documentation Rules
Inspecting Local Variables
Wrap-Up
Tips
In the June 2004 installment of the Bugslayer column, I introduced the amazing FxCop, which analyzes your .NET assemblies for errors and problems based on code that violates the .NET Design Guidelines. This month I'll continue my discussion of building useful and advanced rules that will ensure you're writing the best .NET code possible. I'll assume that you've used FxCop extensively and that you've read the June 2004 Bugslayer column so that you have the basics of rule development under your belt.
For this column, I'm going to dive deeper into building rules, concentrating on those that work directly with the Microsoft® intermediate language (MSIL) instructions in your methods. But first, I think it's best to describe the three rules I developed so you can follow what I did to build them.
As I did in the last column, I need to make a small disclaimer here. The FxCop team has not documented the Introspection engine yet and its implementation is subject to change in future releases. However, as you'll see in the implementation section, what it offers in power and simplicity more than makes up for all the poking around that I did with Lutz Roeder's Reflector to figure out how it worked.
Introducing New Bugslayer Rules
The first of the rules I tackled was Avoid Boxing and Unboxing. As has been beaten into the head of nearly every developer using .NET, converting from value types to objects (boxing) and objects to value types (unboxing) is a bad thing to do frequently because it can kill your performance. However, unless you are adept at divining exactly what MSIL the compiler will generate for your source code, it's hard to know exactly when boxing and unboxing are happening without inspecting all your MSIL with Reflector or ILDASM. So I set out to build a rule that would automatically tell me about boxing and unboxing.
When I first developed this Avoid Boxing and Unboxing rule, I had it set to report an error whenever a single instruction of either type appeared in a method. After a lot of testing and a lot of use, I found that there were quite a few places where boxing occurred that I had no control over, such as in a Windows® Forms wizard-generated InitializeComponent method.
Interestingly, unboxing was relatively rare, but if it did show up in the MSIL stream, it meant I needed to look at the method for possible performance problems. So I changed the Avoid Boxing and Unboxing rule to allow up to three BOX instructions or one UNBOX instruction before reporting an error in the method. (By the way, this rule works on all methods regardless of their declared visibility level—from private to public.)
Before I move onto the next rule, I want to mention one place where you might see the Avoid Boxing and Unboxing rule trigger a false positive. In C# you can use actual strings as the case values inside switch statements. If you have more than 11 strings as case statements, the C# compiler decides to generate a Hashtable lookup system instead of using string comparison branching. In those cases, it's much faster to use the Hashtable, so it's a good choice. The problem is that the key for the Hashtable is the string, but the compiler generates an integer for the value. Consequently, there are chains of box instructions and the rule reports an error. Strings are not used frequently as case values (and rarely in this quantity), so you can safely exclude the error report when it's triggered in these cases.
The second rule I implemented, Exception Documentation Missing, is one that I strongly feel should be enforced by the Design Guidelines. The idea is that these wonderful XML Documentation Comments can be put in our C# source code (and Visual Basic® and Visual C++® in Visual Studio® 2005), but everyone forgets to include the most important tag of all, <exception>. If a method directly throws an exception, there better be documentation letting the consumer of that method know exactly what exceptions it throws. By "directly thrown" exceptions I mean those specifically coded with a throw or rethrow in the method itself. The method user knows that these kinds of exceptions can occur, so he can program accordingly. While other exceptions may occur from submethods that the documented method may call, those are most likely catastrophic errors that are best handled in the application unhandled exception handlers.
My rule grinds through a method and determines the exact list of directly thrown and rethrown exceptions and compares that list to the documented exceptions for that method from the assembly's XML comments documentation file. If there are any exception types not listed in the documentation with <exception> tags, FxCop reports the errors. If there are multiple undocumented exceptions thrown or rethrown, each gets a separate error.
To best take advantage of Exception Documentation Missing, you'll always want to produce an XML comments documentation file when building your assemblies for both debug and release builds. In a debug build, if you leave the default option "incremental compiling" turned on, your builds will stop if you also have "treat all warnings as errors" turned on (as you should). I've not noticed that it compiles any faster turning off incremental builds. Therefore, in the Project Properties dialog, and in the Configuration Properties, Advanced page, I always set Incremental Build to False. The Exception Documentation Missing rule only looks at your public methods.
The final rule, Exception Documentation Invalid, does the opposite of the Exception Documentation Missing rule. If you have an <exception> tag documenting that an exception can be directly thrown or rethrown, but that exception is not actually directly thrown by the method, you could easily confuse the method's user because you have incorrect documentation. Fortunately, this is not a very common problem, but this rule was very easy to develop because I did all the hard work in Exception Documentation Missing, so I went ahead and included it.
Now that you have the rule requirements and know how they work, I can turn to the implementation portions. While you might think the rules took a huge effort, I was very surprised at how little code they really required, which is a testament to the very nice Introspection engine that the FxCop team has implemented.
Avoid Boxing and Unboxing Rule
In the FxCop 1.23 days, I had implemented each of the three rules using the Reflection engine. Since each of these rules required looking at the MSIL for the methods, the implementation required a lot of grunt work. That's because the MSIL for a method was either returned as a whole string blob that I had to parse, or I had to "walk" the MSIL instructions in a rudimentary fashion. Since the Introspection engine is so superior (see the June 2004 column), I decided to rewrite the rules.
As you'll see in a moment, writing the Avoid Boxing and Unboxing rule turned out to be very straightforward. The one weakness in the rule is that I haven't devised a scheme to let the user configure the number of BOX or UNBOX instructions they want to allow. After running my rule on a ton of assemblies, I felt that a default that allows three BOX instructions and zero UNBOX instructions was reasonable as a hardcoded default.
In poking around with Reflector, I found an enticingly named interface, IConfigurableRule, which seemed to indicate that there might be support for some sort of user interface for setting rule options. Unfortunately, none of the rules that ship with FxCop uses the IConfigurableRule interface so it took a lot of guesswork to set up my rule. An even bigger problem was that I needed to figure out if the FxCop GUI would even show some sort of user interface. While I was starting to make some progress, I thought I should first check with the FxCop team to see if IConfigurableRule was the way to go. They informed me that I definitely do not want to be barking up that tree because that interface is going away sometime in the future.
When you actually step back and think about how to allow the user to set the options to a rule, it's a tough problem. You first need to have some sort of consistent UI paradigm that can handle arbitrary data. While that sounds relatively simple, the problems start to pile on when you think about the types of options a rule developer might want to expose to the user. Some rules, like Avoid Boxing and Unboxing, would probably be project specific, but others could be both global to all projects and still have per-project data. It won't be easy, but I certainly hope that the FxCop team will support configurable rules in the future.
While I had to settle for a hardcoded instruction count setting, I knew it was going to be relatively easy to do the actual logic to Avoid Boxing and Unboxing. In the Reflection engine, the Method type had a property, Instructions, that I could use to access the individual MSIL stream instructions. The Introspection engine offers the same functionality, so counting the BOX and UNBOX instructions is as simple as pounding through that array. In Figure 1, the commented out WalkInstructionsManually method does exactly that. As you scrutinize the rules supplied with FxCop, you'll see that many of them, such as Consider Calling Data Members Dispose Method in UsageRules.DLL, use the same technique for performing their analysis.
Figure 1 AvoidBoxingAndUnboxing.CS
using System ; using System.Xml ; using System.Collections ; using System.Globalization ; using Microsoft.Cci ; using Microsoft.Tools.FxCop.Sdk ; using Microsoft.Tools.FxCop.Sdk.Introspection ; namespace Bugslayer.FxCop.PerformanceRules { /// <summary>Rule to check for excessive boxing/unboxing.</summary> [CLSCompliant(false)] public class AvoidBoxingAndUnboxing : BasePerformanceRule { int boxInstuctCount=0; // # of BOX instructions int unboxInstructCount=0; // # of UNBOX instructions static int maxBoxInstruct=3; // max # BOX instructions allowed static int maxUnBoxInstruct=0; // max # UNBOX instructions allowed /// <summary>Gets the base class hooked up.</summary> public AvoidBoxingAndUnboxing() : base ("AvoidBoxingAndUnboxing") { } /// <summary>Check for excessive box/unbox instructions.</summary> /// <param name="memberMethod">The method to check.</param> /// <returns>null if no problems; otherwise, !null.</returns> public override Problem[] Check ( Method memberMethod ) { // Always set the counters to a known value before starting. boxInstuctCount = 0 ; unboxInstructCount = 0 ; // Walk this method's code. I'm only interested in the // body of the method, not all the attributes. VisitBlock ( memberMethod.Body ) ; // Alternatively, you could walk the instructions manually. //WalkInstructionsManually ( memberMethod ) ; Problem[] retvalue = null ; if ( ( boxInstuctCount > maxBoxInstruct ) || ( unboxInstructCount > maxUnBoxInstruct ) ) { // Got an error. String[] arr = new String[] { memberMethod.Name.Name } ; retvalue = Problems.AsArray ( GetResolution ( arr ) ) ; } return ( retvalue ) ; } /// <summary>Called for binary expressions in the IL.</summary> /// <param name="binaryExpression"> /// The binary expression being evaluated. /// </param> /// <returns>Whatever the base class method returns.</returns> public override Expression VisitBinaryExpression( BinaryExpression binaryExpression ) { // NodeType contains the instruction I'm interested in switch ( binaryExpression.NodeType ) { case NodeType.Box : boxInstuctCount++ ; break ; case NodeType.Unbox : case NodeType.UnboxAny : unboxInstructCount++ ; break ; } return ( base.VisitBinaryExpression ( binaryExpression ) ) ; } // You could do the same operation without StandardVisitor. /* private void WalkInstructionsManually ( Method memberMethod ) { for ( int i = 0 ; i < memberMethod.Instructions.Length ; i++ ) { switch ( memberMethod.Instructions[i].OpCode ) { case OpCode.Box : boxInstuctCount++ ; break ; case OpCode.Unbox : case OpCode.Unbox_Any : unboxInstructCount++ ; break ; } } } */ #region Protection Level Overrides /// <summary>Ensure all nested types are seen.</summary> override public ProtectionLevels NestedTypeProtectionLevel { get { return ( ProtectionLevels.All ) ; } } /// <summary>Ensure all types are seen.</summary> override public ProtectionLevels TypeProtectionLevel { get { return ( ProtectionLevels.All ) ; } } /// <summary>Ensure all members are seen.</summary> override public ProtectionLevels MemberProtectionLevel { get { return ( ProtectionLevels.All ) ; } } #endregion } }
With the Introspection engine, there is another way to walk the instruction chain. If you look at the BaseIntrospectionRule type in the Microsoft.Tools.FxCop.Sdk.Introspection namespace, you'll see it derives from another type, Microsoft.Cci.StandardVisitor (from the Microsoft.Cci namespace in Microsoft.Metadata.Prototype.DLL). The StandardVisitor class has all sorts of enticing methods on it like VisitAssignment, VisitIf, VisitForEach, VisitFinally, and VisitParameterList. In fact, there are 137 Visit* methods. All of those methods looked very interesting because they seemed to let you ask for higher-level constructs in the MSIL for the code and much more. I've run across a few rules that used the Visit* methods to perform some really deep analysis. A perfect example is the rule Test For Empty Strings Using String Length in PerformanceRules.DLL.
Knowing that the other two rules I needed to implement required that I get the types thrown, I wanted to use the Visit* approach for Avoid Boxing and Unboxing. The BOX and UNBOX instructions are binary expressions so I overrode the VisitBinaryExpression method when encountering the BOX and UNBOX instructions. Using the FxCop-supplied rules as a guide, to get the visiting started, I called VisitMethod in my Check method, passing in just the Method.Body as all I needed was to scan the MSIL for the method.
In my overridden VisitBinaryExpression, I just needed to take a peek at the NodeType of the BinaryExpression parameter and I was all set. As I was looking at the NodeType, I realized that I also needed to handle the UNBOXANY instruction, so I changed my code to do that. Figure 1 shows everything in action. For this simple rule, I probably didn't need to use the Visit* methods; running through the instructions would have worked just fine. In fact, I think that would have been faster. However, as you'll see, the Visit* methods are where the action is.
Exception Documentation Rules
Before I get into the Exception Documentation Missing Rule implementation with the Introspection engine, I want to show you just how nasty it was to implement this rule in the Reflection engine. While I needed to find the THROW and RETHROW instructions, I really needed to find the types that were thrown. In the Reflection engine, I could do the rudimentary instruction walking, but getting the types for all cases wasn't possible if I did so. However, if I could look at all the MSIL assembly language as a string, I could see the types. While I could have written a big state machine to keep track of types as they came through the rudimentary instruction walking, that would have required a lot of code. There was, fortunately, a means to ask the Reflection engine to disassemble the MSIL into a string buffer.
My plan was to take the whole buffer of the MSIL text and write a regular expression that would extract the exception types for me. I thought that would be much easier than actually coding the state machine. Armed with an outstanding book on regular expressions, Mastering Regular Expressions, 2nd Edition by Jeffrey E. F. Friedl (O'Reilly, 2002), and Chris Sells and Michael Weinhardt's great RegexDesigner.NET, I was ready to go to work. After a day of playing around, I came up with the two following regular expressions that worked great for extracting the direct thrown type (the first one) and number of the local variable (the second one) from a string buffer:
static Regex s_DirectThrowRegex = new Regex ( "(?<=^IL_[0-9a-fA-F]+\\tnewobj )[a-zA-Z_\\.\\+]+" + "(?= \\:\\: [\\w\\(\\)\\,\\. \\t]+\\r\\n" + "IL_[0-9a-fA-F]+\\tthrow\r\n)" , RegexOptions.Multiline | RegexOptions.Compiled ) ; static Regex s_LocalVarThrowRegex = new Regex ( "(?<=^IL_[0-9a-fA-F]+\\t+ldloc\\.(s )?)[0-9]+" + "(?=\\r\\nIL_[0-9a-fA-F]+\\t+throw\\r\\n)" , RegexOptions.Multiline | RegexOptions.Compiled ) ;
Being so proud that I'd been able to write such cool regular expressions that worked on every bit of MSIL I threw at them, I wasn't prepared to learn that due to a bug in the Reflection engine, you couldn't access the local variables for a method. I eventually got everything to work by using Lutz Roeder's ILReader to get the local variables and do the actual analysis, which made me a lot happier and proud again that my plan was working!
After all the fun I had getting the rule to work with the Reflection engine, I was a little concerned about how much work it was going to take to use the Introspection engine. Actually, it was drastically simpler to do.
Poking through the Visit* methods, I saw that the VisitThrow method looked like it fit the bill just as VisitBinaryExpression did for the BOX instructions. The Node class, which is the base type for all parameters to the Visit* methods, contains lots of descriptive information about operands, values, and other important items you'll need to build great analysis rules.
The Throw class, which is passed to the VisitThrow method has an Expression property that can access the Type that's being thrown. Consequently, the statement Throw.Expression.Type.FullName is all that it takes to get the type being thrown, regardless of whether it's a throw followed by a "new exception type" or a throw of a local variable. In the Reflection engine, I had to build those two intense regular expressions to get even close, so I think you can see the power of the Introspection engine.
Of course, my goals weren't easy to achieve, even with the Introspection engine. When I started the work to process the RETHROW instructions, the Expression property was null so I had a problem finding the type. While I could have punted, deciding not to handle rethrows, I really felt they were important to document. After a good bit of poking around, I realized that I needed to look at the catch blocks in the method, which contain information on the exception type to be rethrown, and see if I could match up the rethrow to its actual block.
Getting the exception handlers for a method is trivial because the Introspection engine hands them to you right in the Method.ExceptionHandlers property. One thing I should tell you at this point about the Introspection engine metadata reader is that it's rather lazy—it won't read in the metadata until it is absolutely necessary—so I only accessed Method.ExceptionHandlers in the VisitThrow method. The act of Visiting gets the metadata all read in.
While having all the catch blocks was great, I wasn't sure how I was going to exactly match a RETHROW instruction to the catch block where it occurred. After some poking around, I found a very interesting property, UniqueKey, that all Node values contain. I wondered how unique it was because if everything provided by the Introspection engine (from individual instructions through higher constructs) had a unique value, I could look for the particular RETHROW instruction inside each catch block's instructions so I could match them up. Sure enough, that's exactly how everything works out.
Once I had my VisitThrow method working and gathering the exceptions thrown directly from a method, I turned to processing the XML documentation file. You can see all of that in action in the XmlDocCommentsFileDictionary.CS file, included in the download for this column.
Figure 2 shows the Check and VisitThrow methods, which are common to both the Exception Documentation Missing and Exception Documentation Invalid rules. I set up both rules to compare an array of directly thrown exceptions and documented exceptions, so the majority of the work to build up those arrays could be shared. The only unique part of the rules is the actual comparison. You can see all the code in Figure 2 in the Bugslayer.FxCop.DesignRules project's DesignRules.CS file.
Figure 2 Checking Documented Exceptions
/// <summary>The common method that calls the RuleLogic method.</summary> /// <param name="memberMethod">The method to check.</param> /// <returns> /// null - The derived rule reported no problems. /// !null - The error message to report. /// </returns> public override Problem[] Check ( Method memberMethod ) { // Start by saving off the method under evaluation. currentMethod = memberMethod ; methodExceptions = new ArrayList ( ) ; // Get the XML Doc Comment file for this module. String mod = memberMethod.DeclaringType.DeclaringModule.Location ; XmlDocCommentsFileDictionary currXmlDocs = OpenModuleXmlDocFile(mod) ; // Start walking the body of this method's code. VisitBlock ( memberMethod.Body ) ; // Now that pounding through the IL is done, get the documented // exception tags for this method into an array list. ArrayList docdExceptions = null ; if ( null == currXmlDocs ) { docdExceptions = new ArrayList ( ) ; } else { // Get the documented exceptions for this method. docdExceptions = currXmlDocs.GetExceptionTypesThrown( FixCciDocumentationName ( memberMethod.DocumentationId.Name )); } methodExceptions.Sort ( ) ; docdExceptions.Sort ( ) ; // Ask the derived type to do its logic. If there are any items in // the problemExceptTypes array, those are the ones with problems. ArrayList problemExceptTypes = RuleLogic ( methodExceptions , docdExceptions ) ; // Were there any problems to report? Problem[] retvalue = null ; if ( problemExceptTypes.Count > 0 ) { // Yep, create the Problem report. foreach ( String exVal in problemExceptTypes ) { Problems.Add ( GetResolution( memberMethod.Name.ToString ( ), exVal )); } retvalue = Problems.AsArray ( ) ; } return ( retvalue ) ; } /// <summary>Visits each throw/rethrow instruction.</summary> /// <param name="inst">The throw instruction itself.</param> /// <returns>Whatever the base class VisitThrow method returns.</returns> public override Statement VisitThrow ( Throw inst ) { if ( inst.NodeType == NodeType.Throw ) { // The easy case! Grab the Expression.Type field and I've got // the type being thrown. methodExceptions.Add ( inst.Expression.Type.FullName ) ; } else if ( inst.NodeType == NodeType.Rethrow ) { // A little harder case. There's no type associated with the // rethrow instruction so I need to grovel through the // exception handlers and match up this instruction to the // appropriate handle and extract the type that way. ExceptionHandlerList exceptionHandlerList = currentMethod.ExceptionHandlers ; // Just to make sure.... Debug.Assert ( exceptionHandlerList.Length > 0 , "exceptionHandlerList.Length > 0" ) ; // Loop through the exception handlers. bool foundType = false ; for ( int i = 0 ; i < exceptionHandlerList.Length ; i++ ) { ExceptionHandler exh = exceptionHandlerList[ i ] ; // I only care if it's a catch block. if ( NodeType.Catch == exh.HandlerType ) { // Loop through the statements in the handler block. StatementList statements = exh.HandlerStartBlock.Statements ; for ( int currStatement = 0 ; currStatement < statements.Length ; currStatement++ ) { if ( inst.UniqueKey == statements[currStatement].UniqueKey ) { // Get the type being rethrown. methodExceptions.Add (exh.FilterType.FullName); foundType = true ; break ; } } if ( true == foundType ) break ; } } Debug.Assert ( true == foundType , "true == foundType" ) ; } #if DEBUG else // Just because I'm paranoid... { Debug.Assert ( false , "Invalid Throw type!!" ) ; } #endif return ( base.VisitThrow ( inst ) ) ; }
Inspecting Local Variables
I am frequently asked: "How can I get the local variables with the Introspection engine?" The trick is to override the ubiquitous Check(Method) member, see if there are more than zero instructions, and ask the instructions for the locals. The code for getting the locals is shown here:
public override Problem[] Check ( Method memberMethod ) { InstructionList instructions = memberMethod.Instructions ; if ( 0 == instructions.Length ) return null ; LocalList locals = instructions[0].Value as LocalList ; if ( null == locals ) return null ; // Do the work you want on locals... }
It seems that everyone wants access to the local variables in a method so they can apply naming conventions to locals as you do public method parameters. I think that's overkill, but to each his own. One other thing to keep in mind is that you'll see compiler-generated locals. For C#, those variables start with the string "CS$" and for Visual Basic .NET they start with "VB_".
Wrap-Up
The FxCop team has mentioned that they are going to be producing some documentation for the Introspection engine in the future, so be on the lookout. If you're looking for ideas for rules you can write, here are a couple to get you going.
You could create a rule that checks that all classes derived from Windows.Forms.Controls have the AccessibleName and AccessibleDescription property fields filled out. Not only are those fields vital if accessibility tools such as screen readers are going to work, they are also used by the Windows XP Tablet PC Edition 2005 Tablet Input Panel as well.
You also might want to write a rule that checks methods that return a collection to ensure that an actual collection is returned, not a null value. Take a look at the Properties that Return Collections Should Be Read-Only in UsageRules.DLL for hints on how to get started writing it.
Tips
Tip 65 Lutz Roeder's Reflector (mentioned earlier) offers a great Add-In model. Dennis Bauer has written a very nice Add-In, Reflector.FileDisassembler, that lets you decompile a whole assembly to source files ready for compiling. Download it with source code at https://www.denisbauer.com/NETTools.
Tip 66 Microsoft Virtual PC is a fantastic product that makes it a snap to create a new machine for testing. Many people don't realize that you can create Differencing Virtual Hard Disks. These virtual machines use another virtual machine's disk image as a baseline and only store the differences from the baseline in the new virtual machine. Instead of going through an installation of the whole operating system, you can create a new ready-to-run virtual machine in under a minute. Obviously, you'll want to keep the baseline virtual machine read-only and not use it, so you don't mess up your differencing virtual machines. See the Virtual PC documentation for more information.
Send your questions and comments for John to slayer@microsoft.com.
John Robbins is a cofounder of Wintellect, a software consulting, education, and development firm that specializes in programming for .NET and Windows. His latest book is Debugging Applications for Microsoft .NET and Microsoft Windows (Microsoft Press, 2003). You can contact John at https://www.wintellect.com.