Bugslayer
Minidumps for Specific Exceptions
John Robbins
Code download available at:Bugslayer2006_11.exe(164 KB)
Contents
Getting Started
Minidumps on All .NET Exceptions
Minidumps on a Specific .NET Exception
What About Vectored Exception Handling?
Wrap Up
Tips
Necessity is the mother of invention, and it's usually when you're faced with a nasty bug that most of the invention happens in software development. Recently I was working on a super nasty bug in an ASP.NET 2.0 application and needed to see the state of the worker process exactly at the point where it threw a specific exception. By the time I got the exception in the catch block, it was too late. Because the bug only showed up in the production environment, because it was a mission-critical application (as always!), and because I needed to fix the bug yesterday, there was no way I would have the luxury of poking at the application with the Visual Studio® 2005 debugger. Thus began the quest to find a way to get a minidump of the ASP.NET worker process at the instant an exception is thrown.
In this installment of Bugslayer, I'll cover the use of ADPlus to create a minidump of your Microsoft® .NET Framework 2.0 processes on specific exceptions. Because the configuration files for ADPlus are now mostly documented, we can jump right in. When I first started thinking about the problem, I wasn't sure it would be possible to get a minidump for just a single .NET exception type and not others. However, I was surprised at just how simple it turned out to be. I'll end the column with my attempt to fake out .NET and get the same information written from my code without using any external tools. As you can guess from my usage of the word "attempt," I failed, but it was a worthy exercise.
If you're not familiar with ADPlus, it's probably the world's largest VBScript program, and it drives the console debugger (CDB) to snap minidumps, set breakpoints, and otherwise help you automate the debugging of server-based applications in particular. ADPlus is part of the Debugging Tools for Windows® package, which you can download for free. Documentation for ADPlus is in the Debugger.chm in the same directory where ADPlus.vbs resides. You can read more about ADPlus in the March 2005 Bugslayer column. For this month's column, I'm assuming that you've read the ADPlus documentation.
Before I do anything else, I have to remind you that CDB is a native program debugger and doesn't know anything about .NET. Only if you load the Son of Strike (SOS) debugging extension, located in the .NET Framework installation directory, can you get any .NET-specific information from inside CDB. With that in mind, all the ADPlus documentation speaks in terms of native debugging. The key point to remember when reading the documentation is that a first-chance exception, in .NET terms, is when you throw the exception. A second-chance exception is when the application is crashing with an unhandled exception.
Getting Started
The source code download for this month's column contains all the configuration files I'm going to talk about, and you may want to consider running them as you read through the descriptions. As far as ADPlus and CDB are concerned, there's no difference between W3WP.EXE, the ASP.NET worker process for IIS 6.0, and a Windows Forms application. To make it easier to play with the different configurations, I included a simple application, ExceptionMaker, which throws three different exceptions so you can see how the different ADPlus configuration files perform.
One thing that's tricky about ADPlus is that all command-line options that take files or paths need to have the complete drive and path included. In each of the configuration file directories are some batch files that make it easier to test the configurations, but you'll need to edit them to include the complete path to where you installed the code. All the batch files assume you want to connect to a single running instance of ExceptionMaker.exe. Also, all the batch files assume ADPlus.vbs is in the path.
CDB, because it's a native debugger, only knows about native structured exception handling (SEH) errors such as access violations (code 0xC00000005) and the like. Fortunately, under the covers, .NET exceptions use SEH when they occur. If it wasn't for that fact, you couldn't use ADPlus to run CDB to catch .NET exceptions. If you're curious, the native exception code for a .NET exception is 0xE0434F4D.
Because I'm talking about exceptions, I'm going to have ADPlus run in crash mode. That's not as scary as it sounds. I want the application to continue running so when an exception is thrown CDB will gain control to do your bidding. When running in crash mode, CDB attaches to the process until the process ends or you press Ctrl+C in the CDB window. If you do abort the execution with Ctrl+C, CDB will terminate the debuggee.
With CDB attached in crash mode, your application is basically running at full speed. That means it's perfect for running on your production systems. However, if you're not careful it can have a huge impact on performance. If you configure ADPlus to have CDB do a minidump on every first-chance exception on your large ASP.NET application, you could be writing out many minidumps, and each could take many minutes to write to disk. As with any debugging job, you'll want to configure ADPlus to do the minimum work necessary.
Minidumps on All .NET Exceptions
When I first looked at the (spartan) documentation for the <Exceptions> element of a configuration file, I thought it was going to be easy to get a minidump on all first-chance exceptions. I took the description of FullDumpOnFirstChance at face value and created the following configuration file:
<ADPlus> <Settings> <RunMode>CRASH</RunMode> <Option>Quiet</Option> </Settings> <Exceptions> <Option>FullDumpOnFirstChance</Option> </Exceptions> </ADPlus>
While FullDumpOnFirstChance does create a minidump on each exception, it does not give the file a different name for each of the exceptions that occur. Thus, all .NET exceptions write to the same file name, so you only have the last of any exception type.
Fortunately, the <Config> element under the <Exceptions> element provides an alternative means of having all exceptions go to a unique minidump file. The configuration file looks like Figure 1.
Figure 1 Using AllExceptions
<ADPlus> <Settings> <RunMode>CRASH</RunMode> <Option>Quiet</Option> </Settings> <Exceptions> <Config> <Code>AllExceptions</Code> <Actions1>FullDump</Actions1> </Config> </Exceptions> </ADPlus>
The AllExceptions keyword tells ADPlus that you want the configuration to apply to all exceptions and events that CDB processes. The <Actions1> element contains the commands you want to run on the first-chance exception. In this case, I'm telling ADPlus to configure CDB to write a full memory minidump with a unique name. You can specify a few other options in the <Actions1> element that are discussed in the documentation. If you wanted to have an SOS command run when exceptions are thrown, you could specify them in the <CustomActions1> element. In most cases, you won't need to configure any second-chance exception processing because the default settings will write a full memory minidump automatically.
If you read the ADPlus configuration documentation, you'll note that the discussion of the <Config> element shows that you need to include a <ReturnAction1> element to tell ADPlus which of the Go or Quit commands you want to run when processing the exception. In the configuration file in Figure 1, I'm using the AllExceptions keyword and it turns out that ADPlus doesn't like to see the <ReturnAction1> element. The error reported, "Invalid return action1 in exception command: <cmd>," is wrong because the command itself is invalid. After reading through the ADPlus source code and running a few tests, I learned that the <ReturnAction1> element is only used if you're specifying a specific exception in the <Code> element.
The main drawback to using the AllExceptions approach is that in addition to getting minidumps to all native exceptions, you'll also get minidumps for module load, module unload, and process end. While the minidump that's generated when the process ends can be useful if you're looking for COM interop problems, having the debuggee hang every time a module loads into the address space is a large waste of disk space and time. However, if you're dealing with an application that has a lot of native code in it, you may want to consider using the configuration in Figure 1 in your test environment. No matter why an exception is thrown, you'll have a great record of exactly where it occurred.
In my original problem there was no native code other than what the .NET Framework brought into the address space. Setting up an ADPlus configuration that only created minidumps for .NET exceptions required tweaking the <Code> element and turning off all other minidump creation (see Figure 2).
Figure 2 .NET-Specific Dumps
<ADPlus> <Settings> <RunMode>CRASH</RunMode> <Option>Quiet</Option> </Settings> <Exceptions> <Option>NoDumpOnFirstChance</Option> <Config> <Code>clr</Code> <Actions1>FullDump</Actions1> <ReturnAction1>gn</ReturnAction1> </Config> </Exceptions> </ADPlus>
First I set the <Option> element to NoDumpOnFirstChange to turn off all minidump creation for all exceptions and events. Then in the <Code> element, where you're allowed to configure a single exception or a list of exceptions, I set all .NET exceptions to create a full memory minidump. If I had wanted to have a full memory minidump on an access violation, I would have specified clr;av in the <Code> element. One limitation of ADPlus is that you can only have one <Code> element so you can't have different processing for different exceptions.
Minidumps on a Specific .NET Exception
While getting the minidumps from all .NET exceptions is nice, if you have an application that had a few too many gratuitous .NET exceptions, the blocking that occurs every time you write a dump would be a big problem. In the bug I was working on that precipitated all this poking, I needed to know the instance on which an OutOfMemoryException was being thrown.
If you've looked at the .NET Framework 2.0 version of SOS, you've seen the !stoponexception command (abbreviated !soe) that will break on the .NET exception when it's thrown during live debugging. When you issue the following command:
!soe -create System.ArgumentNullException 3
The help says that it will break when the ArgumentNullException is thrown. The "3" is the debugger pseudo-register in which the command will store a one if the exception is an ArgumentNullException. If the exception is anything else, it will store a zero in the third pseudo-register. By the way, the help for !stoponexception has two mistakes in it. The first is that it says it will automatically use pseudo-register number one if you don't specify the register. However, if you leave the pseudo register off, the command won't work. Secondly, the help text showing the use of the sxe command is incorrect. The command shown needs to have "clr" on the end.
When I first used the !stoponexception command, I was wondering how it worked. Since the only way the native debugger will know that a .NET exception occurred is if the SEH exception number 0xE0434F4D shows up, I thought that the !stoponexception command was doing a bit more work. Running the debugger sx command to show the exception settings and commands associated returns the following for .NET exceptions.
clr - CLR exception - break - not handled Command: "!soe System.ArgumentNullException 3; .if(@$t3==0) {g} .else {.echo 'System.ArgumentNullException hit'}"
Note that the CDB/WinDBG code for .NET exceptions is clr. (You'll notice that I wrapped the line beginning with "Command:" to fit in the column.)
When you use the !stoponexception command it sets the command to run on .NET exceptions to itself. When a .NET exception is thrown, the special execution of !stoponexception looks to see if the exception is the specific type. If the type matches, the command will put 1 in the pseudo-register. The interesting part is the debugger command that does the conditional execution.
The configuration file in Figure 3 shows how you can get the minidump on a specific exception. Note that I spaced out and wrapped the lines in the <CustomActions1> element so they would fit on the page. In the real configuration files, you can do the same, but make sure to end all commands with a semicolon like I have.
Figure 3 Get Minidump on a Specific Exception
<ADPlus> <Settings> <RunMode>CRASH</RunMode> <Option>Quiet</Option> </Settings> <Exceptions> <Option>NoDumpOnFirstChance</Option> <Config> <Code>clr</Code> <Actions1>Void</Actions1> <CustomActions1> .loadby sos mscorwks; !stoponexception System.ArgumentNullException 3; .if(@$t3==1) { .dump /ma /u c:\\dumps\\ANE.dmp } </CustomActions1> <ReturnAction1>gn</ReturnAction1> </Config> </Exceptions> </ADPlus>
The difference between this configuration and the previous one that wrote a minidump on all .NET exceptions is that I'm taking advantage of the <CustomActions1> element, which specifies the custom commands you want to run on first-chance exceptions. I'm also setting the <Actions1> element to Void so that none of the default commands run and I'm only getting the dump on the single .NET exception I want to track.
When a .NET exception occurs, the first command that runs loads the SOS extension from the directory where MSCORWKS.DLL was loaded. You could also put the .loadby command in the configuration file <PreCommands> section, which contains commands that are executed as soon as the debugger attaches or loads the process. The second command is where I ask the !stoponexception command to check the exception type, and if it matches, set the pseudo-register specified to one. The .if checks the value, and if it's one will cause the .dump command to execute.
Unfortunately, there's no way in the configuration file to specify that the dump should be written to the directory where all other output files are written. But at least you can get the date and time on the minidump by specifying the /u option. It would be nice if you could insert a special code in the appropriate elements to indicate the output directory.
One important point about using the debugger command programs is that there's a small bug in their parsing. Notice that I used double slashes in the path portion of the file name passed to the .dump command. If you forget the double slashes all your dumps will be written to the root directory, if the debugger user account has write access.
On some bugs you may need to create minidumps on multiple dumps. Fortunately, it's quite easy to do because the debugger command program extension supports else operations. The configuration file in Figure 4 writes different dumps on different .NET exceptions.
Figure 4 Different Dumps for Each .NET Exception
<ADPlus> <Settings> <RunMode>CRASH</RunMode> <Option>Quiet</Option> </Settings> <Exceptions> <Option>NoDumpOnFirstChance</Option> <Config> <Code>clr</Code> <Actions1>Void</Actions1> <CustomActions1> .loadby sos mscorwks; !stoponexception System.ArgumentNullException 3; !stoponexception System.IO.FileNotFoundException 4; .if(@$t3==1) { .dump /ma /u c:\\Dumps\\ANE.dmp } .elsif(@$t4==1) { .dump /ma /u c:\\Dumps\FNFE.dmp } </CustomActions1> <ReturnAction1>gn</ReturnAction1> </Config> </Exceptions> </ADPlus>
If you need to handle multiple conditionals, be careful about a second parsing problem for debugger command programs—if you're going to use the .elsif statement, make sure you spell the statement without the "e." If you use the incorrect ".elseif" the debugger will terminate as soon as it evaluates the invalid statement.
What About Vectored Exception Handling?
Once I had ADPlus doing all the heavy lifting to snap minidumps, I had to wonder if there was a way I could integrate similar techniques into my applications without relying on external tools. Having done Windows development for a little too long, I remembered that in native code you could get all exceptions reported in your application through vectored exception handling.
Back in the September 2001 MSDN®Magazine, my good friend Matt Pietrek wrote an excellent introduction to vectored exception handling. What makes it so interesting is that from native code, you can jam in a callback on all exceptions so you are notified before any exception chain unwinding occurs. Under the covers, .NET exception handling uses structured exception handling. My plan was to see if there was a way to write some sneaky .NET code that would allow me to fake out .NET, get notification for all exceptions, and do it all without disturbing the .NET infrastructure.
Whenever you even consider doing anything a little out of the ordinary, spending some quality time searching the Web is well worth the effort. A bit of searching turned up Mike Stall's excellent blog entry on vectored exception handling and .NET. As Mike is a developer on the Common Language Runtime (CLR) Debugging Team, I pay attention to anything he says, and he's quite clear that using vectored exception handling from .NET is definitely not a good idea. However, if you read his blog entry closely, you'll see he's mainly talking about attempting to use .NET code to implement your vectored exception handling.
Mike doesn't say anything specifically about implementing your vectored exception handling from pure native code, so that got me thinking. I wanted to see if I could do all the vectored exception handling through native C++ code so that I could snap dumps of specific exceptions without having the Debugging Tools for Windows installed on the machine. If I could do this, I could offer a great diagnostic tool. If your end users were reporting even intermittently reproducible problems in your application, you could have them toggle a setting, and on each exception of a specific type, you could write an SOS-compatible minidump whenever that exception bubbled through your system. If you've been reading the Bugslayer column for a while, you know that this is the type of functionality that interests me quite a bit.
Keeping Mike Stall's blog in mind, my plan was to come up with a solution that would avoid messing up .NET but allow you to get the minidumps when you needed them. Mike says very explicitly that you can't call any .NET code from the vectored exception handler and that you only want to consider touching .NET exceptions. That seemed fair enough, so I started on my quest to build the library.
Obviously, my first task was to write the native C++ DLL that I could call from managed code to deal with all the vectored exception handling needs. There wasn't anything exciting about the core code and I was able to get everything hooked up so no matter how I created an exception, or what exception I tried, my native vectored exception handler was called.
My next step was to try to figure out the type of the .NET exception being thrown. If you look at the native prototype for the vectored exception handling code, your native callback function receives the PEXCEPTION_POINTERS, which completely describes the exception at the instant it's thrown. Since all .NET exceptions are 0xE0434F4D, it's easy to see a .NET exception, but I wanted to zoom in a little closer on the particular exception.
After a good amount of experimentation, I noticed that the pExceptionInfo->ExceptionRecord->NumberParameters value was always one on any type of .NET exception. The NumberParameters field shows how many items in the pExceptionInfo->ExceptionRecord->ExceptionInformation array describe additional data about the exception. Where an access violation would put the address you were trying to read from or write to in the ExceptionInformation array, a .NET Framework exception has a different value.
My original hope was that the extra information about a .NET exception would be the metadata ID of the actual exception type. You can see any metadata ID values by running ILDASM with the /OUT and /TOKENS switches so you can see all the metadata information for your assemblies. After dumping numerous .NET runtime and custom assemblies, I didn't see any of the values in the ExceptionInformation array that matched metadata IDs.
I started poking around the CLR headers in the <VS Install Directory>\SDK\v2.0\include, which contains the complete native declarations for everything .NET. Looking at the specific types I was throwing, I eventually stumbled on CORERROR.H, which contains all the HRESULT values for errors that occur in the native .NET interfaces. A little study of the values I was seeing in the ExceptionInformation array lead me to the "Classlib errors" section, which contains the HRESULT values of key exceptions in .NET. While extremely important values like 0x80020012 indicate a divide by zero, all programmatic exceptions are lumped under the same HRESULT so I wasn't going to be able to easily differentiate a RankException from TimeoutException.
Since Mike's blog is very specific about the fact that vectored exception handling can completely mess up managed code, I was worried that even calling into the native interfaces could cause issues. Additionally, with the native interfaces, there is no clean way to jump into .NET from a native application to grab the particular exception. I found the best I could do was look for specific values in the ExceptionInformation array. The CORERROR.H file lists the particular HRESULT codes for all the super-critical .NET exceptions, such as OutOfMemoryExceptions.
While not perfect, at least you could specify the super-critical exceptions you wanted to see, which looked promising enough. I turned my attention to figuring out how I was going to write the minidump inside my vectored exception handler native function. The documentation for MiniDumpWriteDump is quite clear that calling the function just by itself won't work when called inside the process.
After hours of poking around, I thought the best way to get the minidump was to do it from another process. That meant I was going to spawn off that process from the native vectored exception handling function. It turned out that the minidump writing process was fairly trivial to build in .NET.
With all the pieces in place, I was finally able to start doing some serious testing to see if my scheme was going to work. I tried every type of .NET application from Windows Forms, to console applications, to a slew of ASP.NET examples. Interestingly I found that my big scheme would generally work, sort of. In many runs, everything worked as expected, but some runs caused an endless loop of 0x80010108 errors when I was debugging with both managed and native debugging turned on in the process. No matter what I tried on different machines, I couldn't get everything working consistently. While I hate to admit failure, this is one case where I'm quite positive I've gotten to the bottom of the well.
Wrap Up
If you've read the Bugslayer column over the years, you know I won't hesitate to do some serious hacking to get a trick working to provide a cool tool. However, in the case of vectored exception handling, I had to throw in the towel and declare the path a dead one. Mike Stall wasn't kidding when he said vectored exception handling wasn't a great idea in .NET. As similar things had been said about other sneaky tricks in the past, I don't consider my time wasted, because I hope to save you the time of going down the same path I did.
Even without the vectored exception handling tricks, we still have the outstanding ADPlus and configuration files to get those dumps right when we need them. While it does require the Debugging Tools for Windows on the production machine, at least it's an XCOPY installation. I hope I was able to show you the power of the tools and how to think a bit outside the box on using them to get exactly the minidump you need to solve the hardest of problems.
Tips
Tip 75 Are you still compiling a bunch of .NET Framework 1.1 code that you can't move over to the .NET Framework 2.0 just yet? You probably feel like a kid with your face pressed up against the pet store window wishing you could use that cool MSBUILD system. The great news is that Microsoft has seen your smudges and has released MSBee, an extension to MSBUILD to allow compiling of .NET Framework 1.1 applications. Grab the code and the documentation.
Tip 76 Are you hot for Visual Studio 2005 Debugger Visualizers like I am? If so, make sure to check out a great list of tips by Frans Bourma on tricks learned the hard way. Great stuff!
Send your questions and comments for John to slayer@microsoft.com.
John Robbins is a cofounder of Wintellect, a software consulting, education, and development firm that specializes in the .NET and Windows platforms. His latest book is Debugging Microsoft .NET 2.0 Applications (Microsoft Press, 2006). You can contact John at www.wintellect.com.