A while back, a nifty DLL called IMAGEHLP.DLL appeared from deep within the Windows NT® team. IMAGEHLP provided APIs for reading and modifying Portable Executable files, as well as rudimentary code for working with debug symbols. In a nutshell, IMAGEHLP was a dumping ground for various routines used by linkers, debuggers, and related tools. At first, IMAGEHLP.DLL was a redistributable DLL, but eventually it became important enough to make part of the operating system.
As time went on, IMAGEHLP's debugging and symbol management functionality grew. Eventually, Microsoft split the executable image manipulation code away from the symbol-related code. The resultant symbol code ended up in DBGHELP.DLL. DBGHELP has become an official, Microsoft-endorsed mechanism for reading all types of debug information generated by Microsoft tools. In addition, DBGHELP knows all the tricks to walk a call stack on multiple CPU platforms. DBGHELP.DLL encapsulates a lot of nasty, tricky, OS and version-specific code so that you don't have to write it yourself.
Both John Robbins (of Bugslayer fame) and I have covered features of IMAGEHLP and DBGHELP in past columns in MSDN® Magazine. This month, I'll cover some newly added features that take DBGHELP to a whole new level. I've also provided a nifty class (WheatyExceptionReport) that can be easily dropped into C++-based projects to provide a detailed crash report, including the names and values of all local and global variables at the time of a program crash.
DBGHELP has gone through a number of iterations. I'm using version 5.1 here, which comes with Microsoft® Windows® XP. In the sample code for this column, you'll absolutely need to have this latest version (which is redistributable) for the code to run.
DBGHELP is not the only way to read debug information. The COFF and CodeView symbol formats are documented, so you can read them directly. In addition, there is the PDB format, which somewhat resembles CodeView® internally. Microsoft has not documented the PDB format, but instead has provided tool vendors with a private API for reading and writing PDB files.
More recently, Visual Studio® .NET introduced a new format for the PDB format, which is identified by an RSDS signature. The Debug Information Access (DIA) SDK, which is currently supplied in Visual Studio .NET betas, can read RSDS and earlier format files. Going forward, the DIA APIs are the official, Microsoft-endorsed method of accessing symbolic information from Microsoft code, both managed and unmanaged. However, the DIA APIs are COM-based and not for the faint of heart.
The beauty of DBGHELP is that its mission in life is to protect you from all the hassles of needing to know how to read all the various formats. DBGHELP provides a nice, relatively simple API on top of a rat's nest of symbol management and stack walking code. Having looked at both DIA and DBGHELP, I think DBGHELP has a flatter learning curve.
What's New in DBGHELP 5.1?
Version 5.1 of DBGHELP.DLL offers several new sets of functions, as well as support for the latest Visual Studio .NET debug formats. Most of the new functions offer brand new functionality. However, a few just revise or extend previous DBGHELP APIs to provide better functionality. All of the old APIs are still there, so your existing code that uses DBGHELP shouldn't break.
The first detail I noticed when examining the new features of DBGHELP 5.1 was that the existing documentation was lacking some important information. Although not incorrect, there were major gaps that needed to be filled before the scope of the new functionality could be appreciated. In this column, I'll attempt to fill in some of those gaps.
To start the tour of new DBGHELP features, consider the symbol information obtained by older APIs such as SymGetSymFromAddr and SymEnumerateSymbols. There's minimal information about a symbol such as its name and address, but there's no type information whatsoever. DBGHELP 5.1 introduces a much more complete and consistent way of describing symbols.
Part of the reason for reworking the way DBGHELP works with symbols is that it now supports local variables and parameters. Using the pre-5.1 APIs, you'd only get symbol information for functions and variables at global scope. The way that the CPU references locals and parameters is different from globals, so a new, more descriptive means of describing a symbol is needed.
Another reason for reworking the symbol support is that DBGHELP 5.1 includes partial support for symbol types. A symbol type can be a basic type (such as an integer or a float) or a more complex user-defined type (that is, structures, unions, enums, and so on). Future versions of DBGHELP may offer even more complete symbol type support. However, using the new SYMBOL_INFO structure and some additional code, you can do things like deconstruct all the members of a structure down to their basic types.
DBGHELP 5.1 contains a few new functions for source file support. Although not documented in the August 2001 SDK, the SymEnumSourceFiles API is self explanatory and follows the same enumeration model as the other DBGHELP enumeration APIs. The SymFindFileInPath API lets applications use the standard debugger logic for locating symbol files. It's not uncommon for an executable and its associated symbol file to be in separate directories, so debuggers often have to go hunting around to find the appropriate symbol files.
The next new addition to DBGHELP 5.1 relates to hunting down symbol files. With version 5.1 of DBGHELP comes symbol provider capabilities. Essentially, a symbol provider DLL is a way for DBGHELP to call an external DLL and have the DLL provide the symbol file. The primary purpose of this feature is to allow symbol files to be provided on-demand over the Internet. In the latest downloadable versions of WinDbg, the debugger uses a Microsoft-written symbol provider that automatically downloads the latest symbol files on demand for newer Microsoft operating systems and other selected products.
Finally, DBGHELP 5.1 provides MiniDump capabilities. At any point (including when a program crashes), DBGHELP can create a minidump file that contains basic information about the program state such as its threads, modules, and certain locations in memory. In theory, a minidump file can be read by current versions of WinDbg and Visual Studio .NET, although I've had trouble doing so in my simple test cases.
Drilling into the New Symbol and Type Information
DBGHELP 5.1 introduces the SYMBOL_INFO structure, which is used by a new set of APIs for looking up and enumerating symbols. A SYMBOL_INFO structure contains much more than just a name and address. The new DBGHELP 5.1 APIs, namely SymEnumSymbols, SymFromAddr, and SymFromName, all return SYMBOL_INFO structures.
Now let's take a quick look at some of the fields of a SYMBOL_INFO structure. For starters, it has a TypeIndex field from which information about the symbol's type can be learned. For instance, if a symbol represents a structure, it's possible to enumerate the members of structure and classes, as well as get their types, sizes, and offsets within the structure.
The SYMBOL_INFO.TypeInfo member is useless by itself. It doesn't contain any magic values (such as CodeView format constants) that can be decoded. However, the TypeInfo member can be passed to the new SymGetTypeInfo API to retrieve information about the symbol or its underlying type. SymGetTypeInfo returns a wide variety of information, but its documentation is less than clear. I'll describe some of the more interesting options later and leave you to experiment with other options.
The next interesting field in the SYMBOL_INFO structure is the ModBase, which is the load address for the module containing the symbol. You'll need to pass this value to the SymGetTypeInfo API to get correct results whenever you happen to be working with multiple symbol tables.
The SYMBOL_INFO.Flags member provides useful information about a symbol, such as whether it's a local variable, a parameter, or a global variable. If the symbol is a frame-based local variable or parameter, the IMAGEHLP_SYMBOL_INFO_LOCAL flag is set. If the symbol is a parameter, the IMAGEHLP_SYMBOL_INFO_PARAMETER flag is also set. The use of these two flags can be confusing at first, but at least the behavior is consistent.
If the symbol is a frame-based local variable or parameter, the SYMBOL_INFO.Address member provides the offset of the variable relative to the stack frame. In this case, the IMAGEHLP_SYMBOL_INFO_REGRELATIVE flag is on, and the SYMBOL_INFO.Register field contains an undocumented enumeration indicating which register holds the frame pointer. Currently, this value seems to always be 8, indicating the EBP register. Be warned, however, that the encoding of the Register member may change at some point in the future.
If the parameter or local variable resides in a register, the IMAGEHLP_SYMBOL_INFO_REGISTER flag is set in the SYMBOL_INFO.Flags member. The SYMBOL_INFO.Register field contains the same enumeration which I have just described that indicates which register is referenced. Of course, if the symbol is a global variable, the SYMBOL_INFO.Address field is an actual linear address for the variable.
Continuing through the SYMBOL_INFO structure, the Tag member indicates what type of symbol is being described. The values correspond to the SymTagEnum enumeration from the CVCONST.H file which is part of the DIA SDK. Some of the more commonly encountered Tag values are as follows:
In my experience, you'll see all of these Tag values in executables with full PDB information. If you're working with the stripped PDB files that Microsoft provides for operating system components, you'll only see SymTagPublicSymbol types.
The Mysterious SymGetTypeInfo
The SymGetTypeInfo API is central to DBGHELP 5.1's support of type information. The API takes a type index parameter, a module load address, a process handle, and an IMAGEHLP_SYMBOL_TYPE_INFO enum as input. The IMAGEHLP_SYMBOL_TYPE_INFO enum specifies what information to return in the output parameter. I won't cover all the possible enum values, but I will describe the ones that I've found to be most useful.
It's important to know up front that the meaning of Type Index values is less than clear. For one thing, they're not like CodeView format type indices, where values below 0x1000 were predefined types (like integer) and values above 0x1000 are user-defined types. Also, starting with a Type Index in a SYMBOL_INFO structure, you can get a TypeId, and that TypeId must be used in certain other calls to SymGetTypeInfo. The best thing I can suggest for documentation is to look at my sample code and see how it uses Type Indexes and what it passes to SymGetTypeInfo.
To determine if a symbol is a user-defined type, you can pass a Type Index, along with TI_GET_CHILDRENCOUNT to SymGetTypeInfo. If it returns TRUE, the output parameter indicates how many children it has. If there are children, call SymGetTypeInfo again, this time passing TI_FINDCHILDREN and an appropriately initialized TI_FINDCHILDREN_PARAMS structure. See the sample code for this column for details on how to do this. Assuming you've done things correctly, you'll get back a collection of DWORD-sized Type Indexes, one for each structure or enumeration member.
What can you do with these Child Type Indexes? For starters, you can pass them to SymGetTypeInfo, with the TI_GET_SYMNAME option. This time, you'll get back the name of the child member as a Unicode string. Using just the three calls to SymGetTypeInfo that I've described, you can list all members of any structure. Don't forget to free the returned Unicode strings by calling LocalFree.
While the names of the structure members are important, odds are that you'll want more information about them. That's where additional calls to SymGetTypeInfo can help out. Some of these calls work directly with the values returned from the TI_FINDCHILDREN call, while others require a different value, which you get by calling SymGetTypeInfo with the TI_GET_TYPEID. There are no hard-and-fast rules here. The sample sources are a good place to see that what I've found works.
To get the offset of the member within the structure (or class), pass the type index and TI_GET_OFFSET to SymGetTypeInfo. Likewise, to get the size of the member, use TI_GET_LENGTH. If the type under examination is an array, you can use TI_GET_COUNT to determine how many elements are present.
The TI_GET_BASETYPE form of a SymGetTypeInfo call yields a value known as a base type. These values correlate to the BasicType enumeration in the previously mentioned CVCONST.H file from the DIA SDK. From the BasicType, you can learn if the member is a char, int, unsigned int, float, and so on. Note that the size is not implicitly part of some of these values. For example, both floats and doubles show up as btFloats. However, by determining the size of the member (which you can obtain via a different call), you can deduce whether it's a float or a double.
Enumerating the local variables and parameters of a method is tricky if you don't know the secret. The first key detail to know is the SymSetContext API, which currently isn't documented very well. The purpose of SymSetContext is to let DBGHelp know which method you want to enumerate locals and parameters for.
To call SymSetContext correctly, you need to pass a pointer to a correctly initialized structure of type IMAGEHLP_STACK_FRAME. Although there are many fields in the structure, only one, InstructionOffset, is important for x86 usage. The InstructionOffset field should be set to the address within the routine for which you want to enumerate symbols. It should be a linear address, not a relative virtual address.
Having called SymSetContext correctly, call SymEnumSymbols next. This is a new API that effectively replaces the older SymEnumerateSymbols API. When you call SymEnumSymbols to enumerate locals and parameters, it's critical to pass 0 as the BaseOfDll parameter. If you pass a non-zero BaseOfAddr value, the API enumerates the global variables of the specified module.
If you use SymSetContext to access the local variables in a function, be aware that the debug information is scope-aware. That is, if you declare local variables other than at the top of the function (inside of an if block, for example) you won't see that symbol if you just pass the function's starting address to SymSetContext. If you want the complete set of locals at a given point in a function, you'll need to call SymSetContext for the desired address.
Showing Off the New DBGHELP APIs
In my April 1997 column in Microsoft Systems Journal, I presented the MSJExceptionHandler code. The idea was that when an unhandled exception occurred, the code wrote out a report file with some basic information like the registers and a stack trace. To make this happen, the code installed an unhandled exception handler. Since the code relied heavily on the original IMAGEHLP APIs, this was a perfect opportunity to upgrade the code to take advantage of the newer DBGHELP APIs.
The new version is the WheatyExceptionReport class in Figure 1. This code did everything that the original code does and adds a few new features. The most important new feature is the display of names and values for parameters and locals of all functions in the faulting thread. To do this, the code takes advantage of both the new symbol enumeration and the type APIs in DBGHELP 5.1.
In addition, the WheatyExceptionReport code attempts to show the names and values of all the global variables. I say "attempts" because the type information available from DBGHELP isn't always complete enough to accurately know how to format a particular value. This is actually the case for both local and global variables. While my code for formatting symbol values can probably be improved, it does a good job using what I'd call a fairly small amount of code.
The last new feature of WheatyExceptionReport is that the call stack entries now include the source file and line number, if available. This capability comes from the SymGetLineFromAddr API, introduced in DBGHELP 5.0.
Using WheatyExceptionReport from unmanaged C++ code is incredibly easy. Simply include the source and header file in your project and rebuild. The code in Figure 1 defines a global variable called g_WheatyExceptionReport that's of type WheatyExceptionReport. In the WheatyExceptionReport constructor, the code calls SetUnhandledExceptionFilter, and sets the handler to WheatyExceptionReport::WheatyUnhandledExceptionFilter.
WheatyExceptionReport is primarily intended for use in unmanaged programs written using a Microsoft language compiler. Conceptually, there's no reason why it couldn't be used in a Microsoft .NET program, but .NET provides much better error reporting capabilities, so this code wouldn't add much value. To use WheatyExceptionReport with a language like Visual Basic® 6.0, you'll need to compile the code into a DLL, and make sure the DLL is loaded early in the application's startup so that the unhandled exception filter is installed.
When an exception occurs, the WheatyUnhandledExceptionFilter method is called with a pointer to an EXCEPTION_POINTERS structure. This structure is interrogated to find the exception type and address, as well as the register values. The filter function either creates a new file to write the report to, or appends a new report to an existing file. The report file is in the same directory as the application EXE file, has the same base file name, and an .RPT extension. For instance, if C:\FOO\BAR.EXE faulted, the report file would be C:\FOO\BAR.RPT.
Figure 2 shows a portion of an .RPT file. The top few lines indicate the type of the exception (in this case, an access violation) and its address. The address is given both as a linear address and a logical address (section:offset) within a module. Following that is a display of the register values at the time of the exception.
If you study the figure closely, you'll notice that there are actually two call stacks. The first call stack is the terse version, with one stack frame entry per line. Besides the addresses of the instruction and frame pointers, the lines also contain the name of the function and its source file information, if available. This call stack is intended to let you see quickly how control got to the faulting code.
The second call stack comes in the section titled "Local Variables and Parameters." Here, the first line of each stack entry is identical, as in the first stack walk. Immediately following each frame is the name and value of each local variable and parameter. When the variable is a structure, the name and type of the structure are shown. For instance, the following is a local variable of type _SYSTEM_INFO, called sysinfo.
Local 'sysinfo' _SYSTEM_INFO
When showing the data members of a structure, each member is indented. The display code uses recursion to drill down through all of the nested structures, so nested members are indented in an appropriate manner.
The last section of the .RPT file is entitled "Global Variables." Here the code uses SymEnumSymbols, but passes the module handle of the faulting module. It uses the same symbol formatting function (WheatyExceptionReport::FormatSymbolValue) as used in the stack frame walk code. Again, I would caution you that this code is not bulletproof, but in my testing it does a pretty good job with most simple variable types, as well as structures. It specifically doesn't attempt to display any sort of data referenced via a pointer, with the exception of ANSI strings.
To test out WheatyExceptionReport, I created the TestExceptionHandler project. It consists of a small executable (TestExceptionHandler.CPP), along with the WheatyExceptionReport code. The executable just calls down through a few functions and creates some interesting local variables of simple and complex types. It then causes an intentional access violation. After running the program, you should end up with a TestExceptionHandler.RPT file, which is viewable with any ASCII editor of your choice. Since the project chains control onto the previous handler (after writing out the report), you will see the system dialog informing you of an unhandled exception.
The information from an .RPT file isn't nearly as complete as the information from a minidump crash file. However, you may not want to use the minidump facilities, or you may have different needs. Feel free to extend the WheatyExceptionReport code any way you find useful and make sure to let me know if you do anything interesting with it!
DBGHELP 5.1, introduced in Windows XP, provides a healthy collection of new functions. In this column, I've shown you some of the new type and symbol enumeration APIs. There are even a few I didn't have space to cover here, but which are interesting nonetheless (for example, SymEnumTypes).
Using these new APIs, I've constructed an updated exception reporting facility that's incredibly simple to use. The code is well commented and shows how to call the new DBGHELP APIs. On my Web site (https://www.wheaty.net) I've also put the source for another utility (DbgHelpDemo) that does a more formal dump of the symbol and type information within a PDB file. While the DBGHELP information certainly isn't enough to write a full-blown debugger all by itself, the source for these two programs shows how DBGHELP can be a valuable asset in many cases.
Send questions and comments for Matt to email@example.com.