Debug
Detect and Plug GDI Leaks in Your Code with Two Powerful Tools for Windows XP
Christophe Nasarre
Code download available at:GDILeaks.exe(13,279 KB)
This article assumes you're familiar with Win32 and C#
Level of Difficulty123
SUMMARY
In a previous article, the author devised a simple method to detect Graphical Device Interface (GDI) objects that are not properly released by Win32-based applications on Windows 9x platforms. Because some newer versions of Windows require a slightly different approach to GDI leaks, the author has updated his techniques for those operating systems. He builds and explains two tools designed to detect and eradicate GDI leaks in applications running on Windows XP, Windows 2000, and Windows NT.
Contents
Windows 9x versus Windows XP
How GDI Manages Handles
Tracking Object Allocation with API Hooking
Using Stack Tracing to Monitor Allocations
Debugging to Determine DLLs to Patch
Injecting Running Code into Another Process
Debugger to Debuggee Communication
Deadlock and Timing Issues
Displaying GDI Objects
Conclusion
In Windows® 95, Windows 98, and Windows Me, a graphics device interface (GDI) handle is a 16-bit value, and any application can use it to call functions from the GDI API. In the March 2001 issue of MSDN® Magazine, I explained how to take advantage of the 16-bit side of these platforms to build GDIUsage, a tool that lists, compares, and displays the GDI objects used by all applications (see "Resource Leaks: Detecting, Locating, and Repairing Your Leaky GDI Code"). This article will show you how to write the same kind of tool for Windows XP. The same method that I will use here will apply equally well to Windows 2000 and Windows NT® 4.0, but for the purposes of this article, I will use Windows XP to represent all three platforms.
Figure 1** GDI Use in Windows 2000 **
This article describes the differences between Windows 9x and the Windows XP platforms and suggests some solutions to solve the problems that arise during the implementation of the tool that you will see in Figure 1. I'll explain how to determine the GDI resource consumption for a process using a code injection mechanism and how to patch a process or a DLL to be notified when a GDI object is created. Following that, I will show you how to write a Win32® debugger to drive a process, how to let the process and the debugger communicate with each other, and how to implement a call stack manager to provide additional information about GDI object's resource allocation.
Windows 9x versus Windows XP
With Windows XP, a list of GDI objects is associated with each process, mostly managed in kernel mode by win32k.sys, the device driver responsible for USER and GDI implementation. Win32 applications call these system services through the API provided by user32.dll and gdi32.dll. Since Windows keeps a record of GDI objects on a per-process basis, only the application that created a GDI object is able to use the corresponding GDI functions with it.
The Windows 9x version of GDIUsage uses the GetObjectType API function which provides the type of GDI object for a given handle value to check if a random value is a valid GDI handle. Unlike Windows 9x, however, a Windows XP GDI object handle is a full 32-bit value. The possible range is from 0 to 0xFFFFFFFF, and, in order to list all real GDI objects, each possible handle value needs to be given to GetObjectType. This creates a real performance issue. The following code takes minutes to execute, and unfortunately, a lot of GDI objects detected by the running test application are not real (over five hundred of them!):
DWORD dwObjectType; DWORD hGdi; for (hGdi = 0; hGdi < 0xFFFFFFFF; hGdi++) { dwObjectType = ::GetObjectType((HANDLE)hGdi); if (dwObjectType != 0) { TRACE("0x%08x -> %u\n", hGdi, dwObjectType); } }
This means that additional (and probably lengthy) testing code using GetObject is necessary to obtain a reliable list. The duration of the loop makes this method unusable. This article presents two other solutions. The first uses the handles table that GDI manages, and the second works by hooking functions from the GDI API. It's worth noting the first solution is not supported nor guaranteed to behave the same in future OS versions. Finding another way to get the list of real GDI objects is the first problem that needs to be solved.
With GDIUsage for Windows 9x, it is possible to display a GDI object using functions corresponding to each GDI object type such as BitBlt for a bitmap or FillRect for a brush. But since these API functions fail when called with a GDI object created by another process, the key feature of the tool I developed, which is the ability to "see" the leaking resource, disappears. On the other hand, the displaying code works fine in the application that created the GDI object. The solution is obvious—the display engine must run into the other process context. An implementation, based on Windows hooks as an interprocess communication mechanism, is described later in the article.
Finally, the Win32 debugging API is used to glue these techniques together, thereby allowing the implementation of a version of GDIUsage that works on the Windows XP and other 32-bit Windows platforms.
How GDI Manages Handles
In my August 2002 article WinDBG is used to obtain the description of the Process Environment Block (PEB) structure. In that article, the GdiSharedHandleTable field, reprinted here as Figure 2, should have attracted your attention. In fact, this is a pointer to a table where GDI stores its handles, even those created by other processes. In his book Windows Graphics Programming: Win32 GDI and DirectDraw (Prentice Hall, 2002), Feng Huan provides another way to access this table, but he also describes the structure of each of the 0x4000 entries of this table, as you can see here:
typedef struct { DWORD pKernelInfo; // 2000/XP layout but these fields are inverted in Windows NT WORD ProcessID; WORD _nCount; WORD nUpper; WORD nType; DWORD pUserInfo; } GDITableEntry;
Figure 2 PEB Structure Using WinDbg and kdex2x86
0:000> !kdex2x86.strct PEB Loaded kdex2x86 extension DLL struct _PEB (sizeof=488) +000 byte InheritedAddressSpace +001 byte ReadImageFileExecOptions +002 byte BeingDebugged +003 byte SpareBool +004 void *Mutant +008 void *ImageBaseAddress +00c struct _PEB_LDR_DATA *Ldr +010 struct _RTL_USER_PROCESS_PARAMETERS *ProcessParameters +014 void *SubSystemData +018 void *ProcessHeap +01c void *FastPebLock +020 void *FastPebLockRoutine +024 void *FastPebUnlockRoutine +028 uint32 EnvironmentUpdateCount +02c void *KernelCallbackTable +030 uint32 SystemReserved[2] +038 struct _PEB_FREE_BLOCK *FreeList +03c uint32 TlsExpansionCounter +040 void *TlsBitmap +044 uint32 TlsBitmapBits[2] +04c void *ReadOnlySharedMemoryBase +050 void *ReadOnlySharedMemoryHeap +054 void **ReadOnlyStaticServerData +058 void *AnsiCodePageData +05c void *OemCodePageData +060 void *UnicodeCaseTableData +064 uint32 NumberOfProcessors +068 uint32 NtGlobalFlag +070 union _LARGE_INTEGER CriticalSectionTimeout +070 uint32 LowPart +074 int32 HighPart +070 struct __unnamed3 u +070 uint32 LowPart +074 int32 HighPart +070 int64 QuadPart +078 uint32 HeapSegmentReserve +07c uint32 HeapSegmentCommit +080 uint32 HeapDeCommitTotalFreeThreshold +084 uint32 HeapDeCommitFreeBlockThreshold +088 uint32 NumberOfHeaps +08c uint32 MaximumNumberOfHeaps +090 void **ProcessHeaps +094 void *GdiSharedHandleTable +098 void *ProcessStarterHelper +09c uint32 GdiDCAttributeList +0a0 void *LoaderLock +0a4 uint32 OSMajorVersion +0a8 uint32 OSMinorVersion +0ac uint16 OSBuildNumber +0ae uint16 OSCSDVersion +0b0 uint32 OSPlatformId +0b4 uint32 ImageSubsystem +0b8 uint32 ImageSubsystemMajorVersion +0bc uint32 ImageSubsystemMinorVersion +0c0 uint32 ImageProcessAffinityMask +0c4 uint32 GdiHandleBuffer[34] +14c function *PostProcessInitRoutine +150 void *TlsExpansionBitmap +154 uint32 TlsExpansionBitmapBits[32] +1d4 uint32 SessionId +1d8 void *AppCompatInfo +1dc struct _UNICODE_STRING CSDVersion +1dc uint16 Length +1de uint16 MaximumLength +1e0 uint16 *Buffer
Each entry stores the details of a GDI handle whose value is easily calculated. Its lower 16 bits are the index in the table and the upper 16 bits are saved in the nUpper field. As its name implies, the ProcessID field contains the ID of the process that created the object. With this information, a simple loop allows you to list the objects that a given process is consuming and this is exactly what GDIndicator does, as shown in Figure 3.
Figure 3 Running Processes and Their GDI Objects Consumption
If you are interested in knowing the different ways to get the running process list, you should read my June 2002 article. Each process has an ID, which is used to collect the GDI objects it consumes from the shared table, using the ProcessID field to make the comparison. The resulting counts are displayed in GDIndicator under each object type column.
Unlike the other columns, the third column displays two values. The first value is the result of a call to GetGuiResources (this is supposed to return the count of handles to GUI objects in use by the process), and the second, bracketed value is the sum obtained during the parsing of the GDI handles shared table. As you see in Figure 3, these two values are often different and GetGuiResources always returns a larger count. There is no documented explanation for this difference, nor any obvious relationship with stock objects or unreleased objects. It is possible that, behind your back, GDI is allocating objects that are not stored in the shared table and therefore are objects for which you are not responsible.
One example of this kind of hidden allocation occurs during icon manipulation. When you create or load an icon, Windows needs more than one bitmap to implement the transparency effect. Usually there's one for the mask and one for the visible graphic. Unlike bitmaps, icons are handled by the USER system component, not GDI. This might be the reason that these allocations seem not to be tracked by the code behind GetGuiResources when called for GDI consumption.
Tracking Object Allocation with API Hooking
As you have seen, it is not easy to know what GDI objects are consumed by a given process. How can you know if an object has been allocated by the application code or by the GDI itself behind your back? If you could be notified by Windows when a GDI object is created, it would be very easy to store its handle value and build a list of the objects that are allocated by an application. Unfortunately, the Win32 API does not provide developers with that kind of notification mechanism.
If you want to know when a new object is created, you have to catch the function calls listed in Figure 4.
Figure 4 Objects that Create GDI Objects
Object Type | API Functions |
---|---|
bitmap | LoadBitmapA, LoadBitmapW, CreateBitmap, CreateBitmapIndirect, CreateCompatibleBitmap |
brush | CreateBrushIndirect, CreateSolidBrush, CreatePatternBrush, CreateDIBPatternBrush, CreateDIBPatternBrushPt, CreateHatchBrush |
device context | CreateCompatibleDC, CreateDCA, CreateDCW, CreateICA, CreateICW, GetDC, GetDCEx, GetWindowDC |
font | CreateFontA, CreateFontW, CreateFontIndirectA, CreateFontIndirectW |
metafile | CreateMetaFileA, CreateMetaFileW, CreateEnhMetaFileA, CreateEnhMetaFileW, GetEnhMetaFileA, GetEnhMetaFileW, GetMetaFileA, GetMetaFileW |
pen | CreatePen, CreatePenIndirect, ExtCreatePen |
region | PathToRegion, CreateEllipticRgn, CreateEllipticRgnIndirect, CreatePolygonRgn, CreatePolyPolygonRgn, CreateRectRgn, CreateRectRgnIndirect, CreateRoundRectRgn, ExtCreateRegion |
palette | CreateHalftonePalette, CreatePalette |
Fortunately, in his article "Learn System-Level Win32 Coding Techniques by Writing an API Spy Program" in the December 1994 issue of MSJ, Matt Pietrek explained how to write an API spy engine for the Win32 world. Given a particular module (process or DLL), this engine can replace the address of a function that is called (exported by a DLL) with the address of a function of your own. Once this substitution has been performed, each time the spied module calls a hooked function, your own handler will be executed in its place.
This API hooking principle has been refined over the years (see the John Robbins Bugslayer column in the February 1998 and June 1999 issues of MSJ) and if you need to know the possible implementations for the different Windows platforms, you should read Chapter 22 of Programming Applications For Microsoft Windows Fourth Edition" (Microsoft® Press, 1999) by Jeffrey Richter and Debugging Applications (Microsoft Press, 2000) by John Robbins. I have used the John Robbins method here.
Figure 5** Memory Layout for Call to GetDC **
John's HookImportedFunctionsByName helper function takes a list of functions to patch, the loading address of the system DLL that exports them, the module that calls the functions to be patched, and a list of stubs to be redirected to. On exit, it fills a list containing the addresses of all the functions that were patched. For example, if App.exe is calling GetDC from USSER32.DLL, you get the memory layout shown in Figure 5. If I call HookImportedFunctionsByName with the following input parameters, it would result in the different layout shown in Figure 6.
- a list of system functions to patch (GetDC)
- the address of the DLL that exports fucntions (USER32.dll)
- the module calling (App.exe is directly calling GetDC)
- the list of patching functions (_GetDC from Hook.dll)
In this particular example, the list containing the addresses of all the patched functions would be initial@.
Figure 6** Memory Layout for Patched Call to GetDC **
In addition to being called for each function that you saw listed in Figure 4, the same mechanism is used to hook the functions that free GDI objects (as shown in Figure 7). With these two types of notifications, it is possible for you to keep track of the live and running GDI objects.
Figure 7 Functions that Free GDI Objects
Object Type | API Functions to Free Handles |
---|---|
Display context | DeleteDC |
Metafile | DeleteMetaFile CloseMetaFile |
Enhanced Metafile | DeleteEnhMetaFile CloseEnhMetaFile |
Others | DeleteObject |
The CGDIReflect class is responsible for providing the static stub methods that will be called in place of the system functions. This class derives from CAPIReflect, whose main goal is to redirect a function call from a given module into a static class member using macros. The replacement is done through a call to DoReflect that takes the caller module handle as a parameter. The responsibility of the derived class is to map each system function that you're interested in receiving information about to an appropriate stub counterpart, as you'll see later in this article.
Following the message map mechanism, a set of macros has been defined to help you automate the definition and declaration of stub functions. With the help of the /P compiler option, you can get a .i file for each source file that contains the code with all macros expanded. You need to watch the size of the resulting files, which can be huge, but this method shows you exactly which code is executed, as you will see in the figures and source code that I'll show you later in this article.
Three steps, which I'll outline in a moment, are needed to redirect a function from USER and GDI. Using the same example, I'll show you how to patch GetDC in user.dll.
The first step is to declare a static method in CGDIReflect using the DECLARE_REFLECT_APIxxx macro, where xxx is the number of parameters of the function (1 for GetDC, which takes an HWND as a parameter). This declaration is bracketed by BEGIN_REFLECT_LIST, which defines a hidden TraceReflectCall helper method that provides trace services, and END_REFLECT_LIST, which does nothing:
BEGIN_REFLECT_LIST() DECLARE_REFLECT_API1(GetDC, HDC, HWND, hWnd) END_REFLECT_LIST()
However, the DECLARE_REFLECT_API macro needs to be fed by a system function name, its return type, and its parameters list (both type and name). Supplying this information allows the macro to be expanded into a static method of CGDIReflect (actually a stub) that shares the same prototype and executes the following steps. First, the original system function is called through an alias (instantiated later via DEFINE_API_REFLECT). Then, the handle allocated by the previous call and its type are stored in a CHandleInfo structure (see Figure 8), as is some additional data for future use (see the discussion of DoStackAddressDump in the next section). Finally, a static map member of CGDIReflect is updated to associate the newly allocated handle, the creation of which you wanted to trap, with the aforementioned structure.
Figure 8 CHandleInfo Class Declaration
class CHandleInfo { public: CHandleInfo() { m_hObject = NULL; // handle m_Type = 0; // type (bitmap, brush,...) m_pStack = NULL; // call stack m_Depth = 0; // call stack depth } virtual ~CHandleInfo() { if (m_pStack != NULL) delete [] m_pStack; } public: HANDLE m_hObject; DWORD m_Type; DWORD m_Depth; DWORD* m_pStack; };
Step two implements each static member as an alias of the hooked system functions address through the following macro:
DEFINE_API_REFLECT(CGDIReflect, GetDC);
In my example, that macro expands to:
CGDIReflect::GetDCProc CGDIReflect::__GetDC = 0;
During the third and final step, all these members need to be instantiated, then used at execution time. The method FillStubs is called in both cases. First, FillStubs is called via Init with APIR_STATE_INIT as a parameter and there is no module handle to patch. This happens when the system function's address is calculated using GetProcAddress, and then the address is stored in the alias member (__GetDC for GetDC).
Next, FillStubs is called when a new DLL needs to be patched. DoReflect calls FillStubs with APIR_STATE_ENABLE as a parameter. It redirects every system call (GetDC) made by the module, whose loading address is passed as a parameter, to the corresponding static stub method (_GetDC in my example). This method follows the same pattern as the MFC message map:
BEGIN_IMPLEMENT_API_REFLECT() ••• IMPLEMENT_API_REFLECT(hModule, "USER32.DLL", GetDC); ••• END_IMPLEMENT_API_REFLECT()
To help debug macros during execution, a few AfxTrace calls are scattered throughout the expanded code. SetTraceLevel allows you to choose which action will be traced, according to the values shown in Figure 9.
Figure 9 Flags for SetTraceLevel
Flag | Effects |
---|---|
API_TRACE_LEVEL_NONE | No trace |
API_TRACE_LEVEL_GETPROCADDRESS | Trace if a GDI function can be redirected |
API_TRACE_LEVEL_GETMODULEHANDLE | Detect if a module to be patched is there |
API_TRACE_LEVEL_REFLECT_CALL | Notify when a call is redirected to a stub function |
API_TRACE_LEVEL_DOREFLECT | Show which module is patched in DoReflect |
API_TRACE_LEVEL_INIT | Trace FillStub when it gets the GDI function's address |
API_TRACE_LEVEL_HOOK_IMPORTED_FUNCTION | Log which GDI function is hooked for a given module |
You now have the CGDIReflect class, which allows you to redirect any calls issued by a given module into a stub's method whose unique role is to store the newly created GDI objects into a map. There is, however, one limitation to this implementation—it is not thread safe. If the application you need to check is multithreaded and several threads are calling GDI functions, the resulting behavior might be indeterminate because access to the handles map is not synchronized across threads.
Using Stack Tracing to Monitor Allocations
Each call to the functions of interest ends up in a stub that carries out two actions. The first is to wrap both the allocated handle and its type into a CHandleInfo object declared in GDIReflect.h. The second is more interesting. In the CHandleInfo object, each function address in the current call stack is stored in m_pStack—an allocated array of DWORDs—and the number of addresses held in m_pStack is saved to m_Depth. Therefore, in addition to presenting a GDI object in a graphical way, it is also possible to show the stack of function calls that lead to a specific GDI object allocation, as shown in Figure 10.
Figure 10** Cells that Lead to Objects **
When it's time to walk the stack, imagehlp.dll and dbghelp.dll are your best friends. And to make your life even easier, John Robbins has wrapped this engine into the CSymbolEngine class whose evolution was described in the MSJ Bugslayer columns from April 1998 and February 1999. The last version, using DBGHELP, is detailed in John Robbins' book Debugging Applications which I mentioned earlier.
The CSymbolEngine class is a low-level layer on top of the numerous functions exported by DBGHELP. The CStackManager implemented in StackManager.cpp (in the /Common directory in the code download at the link at the top of this article) provides higher-level features with only three interesting methods. The first one is DoStackAddressDump, which allocates and fills a dump of the current call stack using CSymbolEngine. This method is called by every stub and stores each function address that led to the object's allocation.
Hexadecimal addresses are good for computers, but not for people. In order to transform the array of addresses returned by DoStackAddressDump into a human-readable format, as you saw in Figure 10, it is necessary to call DumpStackAllocation. This method takes both a stack dump and its depth, then returns the translated stack in a CString. The caller of the method is able to choose what kind of line separator he wants between each address, either \r\n for displaying the CString in an edit box or \n for simply logging it using Trace or OutputDebugString. There is no magic behind this method; it calls ConvertStackAddressIntoFunctionName for each address in the given array.
The magic is somewhere else. When the stack is dumped by DoStackAddressDump and an address is stored in the returned array, this method also finds the symbol corresponding to the address using SymGetModuleInfo, SymGetSymFromAddr, and SymGetLineFromAddr defined in CSymbolEngine (see ConvertAddress in StackManager.cpp in the code download for implementation details). Why do the conversion now? The answer is quite simple: at this particular moment you are sure that the corresponding DLL is loaded, but this might not be the case when DumpStackAllocation is called later.
If GDI objects are created frequently, a lot of stack dumps will be generated and stored in m_HandleMap. But the CHandleInfo object saved in this map keeps the address array, not an array of the translated strings. The trick is to use a map member, such as m_AddressToName, to keep track of the translation. This avoids storing long strings instead of DWORDs for each address in the stack dumps and therefore cuts down on memory consumption. The other benefit is that the stack dump is performed faster because m_AddressToName is used as a cache to avoid queries to the symbol engine.
Even though you know how to hook a list of GDI functions, you still need to know in which module those GDI calls are made. Let's say that an application using MFC in a shared DLL is creating a CPen object to manipulate APIs related to Windows pens. The real call to CreatePen (which returns the pen handle) is done within the MFC DLL, not in the calling application code. If you hook only the API functions called from the executable, you miss the calls from all the DLLs used by the application.
Debugging to Determine DLLs to Patch
Under Windows XP, it is easy to get the list of all DLLs loaded by a process at a given time thanks to the PSAPI function EnumProcessModules, as Matt Pietrek explained in "Under the Hood" in the August 1996 issue of MSJ. For dynamically loaded libraries, however, the problem is a little trickier. In addition to hooking system functions, it also becomes mandatory to hook the LoadLibrary calls that the main program issues (both ANSI and UNICODE versions) in order to detect when a new DLL is loaded and recursively apply the same hooking treatment to it.
Two final questions need to be answered. First, how can you know which DLLs should be patched inside the spied process? Second, how can you ensure this code executes inside another process? If such a solution exists, it should also be possible to use it to display a graphical representation of any GDI object handle. In the article on debugging published in August 2002, the Win32 debugging API was used to detect the DLLs that are loaded, either statically or dynamically by a debugged process. The main drawback of this method is the need to start and debug the application. Unlike the Windows 9x version of GDIUsage, the detected GDI objects are necessarily the ones allocated by the debugged application, as you saw in Figure 1.
The Win32 debugging API allows you to write code that launches an application (the debuggee) and is then notified when events occur, such as when a new DLL has been loaded. This is exactly what you need. In order to easily write a debugger, you should simply overload virtual methods that will be called when a debug event occurs. The CGDIDebugger class is derived from the CApplicationDebugger, which I introduced in my August 2002 article. Figure 11 shows the names of the methods that are overloaded by CGDIDebugger, as well as an explanation of what each method does. I will discuss the communication mechanisms between the debugger and the debuggee later.
Figure 11 Methods Overloaded by CGDIDebugger
OverloadedMethod | Note |
---|---|
PreLoadingProcess | Needs to perform members initialization. |
OnCreateProcessDebugEvent | StartTraceGDI is called by the overloaded method and it must make the DLL code inject itself into the debuggee. |
OnExitProcessDebugEvent | StopTraceGDI is called by the overloaded method and it must perform debugging session cleanup. |
OnLoadDLLDebugEvent | PatchModule is called by the overloaded method and it must add the DLL to an internal list and notify the code in the debuggee that each call to the GDI must be hooked. |
OnUnloadDLLDebugEvent | Needs to update the internal DLL list. |
OnThreadMessage | Receives notification from the injected code running within the debuggee. |
Figure 12** Search for a String **
In addition to these methods, OnOutputDebugStringDebugEvent has been overloaded in order to redirect the traces issued by the debuggee (prefixed by >) to a dedicated listbox. It is also possible to copy a selection to the clipboard or search for a string, as shown in Figure 12. When you add trace using TRACE or OutputDebugString, it appears in the listbox. This is an efficient mechanism to debug the code and mark the difference between outputs from the debuggee (prefixed by >) and the debugger.
Injecting Running Code into Another Process
There is one remaining problem to be solved now: there must be a way to make some of the code run in the context of another application. Luckily, Jeffrey Richter dealt with this problem a long time ago in his article "Load Your 32-bit DLL into Another Process's Address Space Using INJLIB" (MSJ, May 1994).
Since we are usually only interested in applications that are using the GDI API, we can assume that such an application uses at least one window to display its drawings. (Otherwise, why would it need GDI?) Therefore, from among the different solutions, the one based on Windows hooks seems like the best option. When you call the following hook, Windows will call your GetMessageHookProc callback when any thread executes GetMessage in any process:
SetWindowsHookEx(WH_GETMESSAGE, GetMessageHookProc, hInstance, 0)
Since this is a system-wide hook (the last parameter is 0), the code of the callback must be in a DLL that is mapped into the address space of each process where a thread calls GetMessage.
If the hook procedure and the driver application agree, it is easy to establish a communication channel between them using predefined messages. This would be a good way to allow the debugger to send requests to some injected code that runs in the context of the spied application; exactly what is needed for the GDI object display dialog box! When the hook procedure intercepts the first message, it first redirects calls for the DLL that's already loaded. Then it starts a new thread that is dedicated to handling requests from the debugger. This creates a communication channel between the debugger and the debugee (see StartInfiltratedThread in GDITrace.cpp for details).
The functions called by the debugger and the debuggee from the hook procedure to the infiltrated thread function have been gathered inside GDITrace.dll, whose behavior is implemented by the CGDITraceApp class. This DLL is statically linked to the debugger application GDIUsage, but is dynamically loaded within the processes that trigger the Windows hook. The functions to be called by the debugger are declared in _GdiTrace.h and grouped together in GdiTrace.cpp to help you understand which part is used by the debugger and which by the debuggee. But why mix such different codes in the same DLL? A few variables need to be shared between both codes and it is easy to share a variable's values between instances of the same DLL, as shown in Figure 13.
Figure 13 Sharing Variables Between Processes
// shared variables // #pragma data_seg(".shared") // debugger DWORD s_dwCallingThreadID = 0; // debuggee DWORD s_dwInfiltratedThreadID = 0; DWORD s_dwProcessID = 0; // API hooking initialization flag // --> set to TRUE once the injected hook function has been called BOOL s_bDebuggeeIsStarted = FALSE; #pragma data_seg() // Be careful not to add any unwanted whitespaces in the following // pragma! #pragma comment(linker, "-section:.shared,rws")
The code defines a PE section named .shared, with the read/write/shared attributes (rws) that contain the five variables prefixed by s_ that need to be shared. Based on this declaration, Windows will store these variables in a memory block that is shared by any process loading the DLL. Therefore, these variables have the same value in every process, especially the debugger and the debuggee. Let's take a look at what goes on when the debugger starts a debuggee and how these variables are used.
When the debuggee starts, the debugger thread receives a CREATE_PROCESS_DEBUG_EVENT, which is handled by OnCreateProcessDebugEvent, which in turn calls StartTraceGDI. This function executes SetSharedVariables, setting the value of s_dwCallingThreadID with the ID of the debugger thread. If the current process ID is the same as the one saved in s_dwProcessID, the hook procedure knows that it is running in the debuggee context and starts to patch GDI calls from the DLLs that have already been loaded. The dedicated thread, which is started next by the hook procedure, stores its ID in s_dwInfiltratedThreadID. Finally, s_bDebuggeeIsStarted is set to TRUE when the hook procedure work is successful and is then used by the debugger to decide if the infiltrated thread is ready to answer requests.
As you need to pass or retrieve a list of GDI object handles between the debugger and the debuggee, a shared buffer larger than just a DWORD or a BOOL is required. In addition to the five variables, a memory mapped file named GDITrace_SharedBuffer is used, and the corresponding memory is pointed to by the CGDITraceApp member m_lpvMem. It is initialized during the DLL startup (see CGDITraceApp::InitInstance in GDITrace.cpp for implementation details). This buffer needs to be created or initialized only when the DLL is loaded by two processes: GDIUsage as a debugger and its current debuggee.
The s_dwProcessID shared variable is used to recognize the difference between the two processes. Its value is always 0 when there is no started debuggee; otherwise, it contains the debuggee process ID. When the DLL is loaded into a process, its InitInstance checks whether s_dwProcessID is equal to 0 (should be GDIUsage) or to GetCurrentProcessId (should be the debuggee) in order to create the memory mapped file.
Debugger to Debuggee Communication
The s_dwInfiltratedThreadID shared variable is used by the debugger to post a request (through a simple Windows message using PostThreadMessage) that will be handled by the thread infiltrated in the debuggee. The other s_dwCallingThreadID shared variable is needed when it's time for the debuggee to notify the debugger that such a request has been fulfilled. For example, when the user clicks the "Take Snapshot!" button, GDIUsage needs to collect the allocated GDI objects from the debuggee.
GDIUsage posts a TM_GET_LIST message to the thread infiltrated into the debuggee whose value is stored in s_dwInfiltratedThreadID. It routes the execution to the OnGetList function along with the parameter, UM_SNAPSHOT_READY, that will be used as the callback message to be received by GDIUsage main dialog. Why not simply use the same TM_GET_LIST? The answer relates to sharing code. Both the "Take Snapshot!" and "Compare" buttons need to get the same list of allocated GDI objects (even though this list is used in a different way) in order to update the corresponding part of the GDIUsage user interface.
To sum up what I've just discussed in the previous paragraphs, the TM_GET_LIST thread message triggers the processing of the GDI objects on the debuggee side. In addition, there are two ways to update GDIUsage main dialog based on two user-defined messages: UM_SNAPSHOT_READY and UM_COMPARE_READY.
The infiltrated thread wakes up and asks OnGetList to handle the request. This CGDITraceApp method enumerates the GDI allocations detected within the debuggee through the patched stubs and copies each object description (handle value and type) into the memory mapped file shared buffer pointed to by m_lpvMem under the following GDI_LIST format:
typedef struct { DWORD dwType; HGDIOBJ hObject; } GDI_ITEM; typedef struct { DWORD dwCount; // count of meaningful GDI_ITEM slots in Items GDI_ITEM Items[]; } GDI_LIST;
In order to notify the debugger that the list is ready, the same TM_GET_LIST message is posted back to the thread identified by s_dwCallingThreadID. This message is received in GDIUsage context by the thread responsible for debug events and routed to CGDIDebugger::OnThreadMessage. This method notifies the UI thread by sending the right user message (UM_SNAPSHOT_READY for this example) to the main dialog with a pointer to the shared memory defined by the memory mapped file where the GDI objects have been stored by the debuggee. This serialized and automatically marshaled list is used to instantiate a CGdiResources by using its CreateFromList method. This class is used to wrap a GDI objects list and also provide services such as enumeration and graphical display.
Deadlock and Timing Issues
Before digging into the graphical display of a remote GDI object, you should be aware of a possible deadlock issue. The previously discussed technique for collecting GDI objects is asynchronous because it relies on Windows messages exchanged between the debugger and the debuggee. If you need strong synchronous communication, you could use Win32 events. For example, the infiltrated thread waits for a specific named event and calls MsgWaitForMultipleObjectsEx to wake up when either a message is posted to its queue or the event gets signaled (see CGDITraceApp::InfiltratedThreadProc in GDITrace.cpp for source code details). In this implementation, the event is used to ask the thread to end its life and therefore is not really synchronous.
Another case that would require true synchronous behavior is the patching of GDI calls when a DLL is loaded by the debuggee. In order to intercept the GDI calls, the debugger must notify the infiltrated thread as soon as possible. Otherwise, the debuggee might start allocating GDI objects before the interception stubs are installed. Here is the scenario that would be a good idea for you to implement:
- The debugger thread is notified that a DLL is loaded in the debuggee through a LOAD_DLL_DEBUG_EVENT returned by WaitForDebugEvent.
- The OnLoadDLLDebugEvent overridden method receives the hModule corresponding to the DLL, stores it in a new shared variable, and asks the debuggee to patch the GDI calls for that particular DLL by signaling an event.
- In order to avoid reentrancy, if another DLL gets loaded by the debuggee, OnLoadDLLDebugEvent waits for another event that is signaled when the debuggee finishes its patching work.
- The debuggee-infiltrated thread wakes up from MsgWaitForMultipleObjectsEx since one of the events it is waiting for has been signaled.
- The CGDITraceApp::OnNewDLL method redirects the GDI calls for the DLL loaded at the address defined in the shared variable filled by the debugger with the DLL hModule.
- The event the debugger is waiting for is signaled by the infiltrated thread.
- The infiltrated thread calls MsgWaitForMultipleObjectsEx waiting for another request to be fulfilled.
- The debugger thread is resumed since the event it was waiting for has been signaled.
Note that the last two steps have the same sequence number because their code is running in two different threads that are scheduled by Windows. You should not expect one to be executed before the other.
Even though this scenario seems to be perfect, it leads to a deadlock between the debugger thread and the thread injected into the debuggee. There is a snag between Steps 3 and 4. The Win32 debugging API assumes that the debugger is waiting for a debuggee event using WaitForDebugEvent. When this function returns, and until ContinueDebugEvent is called, all threads in the debuggee are suspended, even those that have not generated the debug event. Therefore, the debugger thread stays in Step 3, and Step 4 will never be executed by the debuggee since ContinueDebugEvent is supposed to execute after the synchronous communication ends. You should remember that it is not possible to synchronize an action in the debuggee from a debugging event received by the debugger.
In the case of GDIUsage, the message-based mechanism seems to provide fast enough notification. The real problem is somewhere else. Since the infiltrated thread is started after a first message has been retrieved by the debuggee, all statically linked DLLs have already been initialized. If some of them have allocated GDI objects, this consumption will never be detected by GDIUsage. In this case, you need another way to get the patching code to be executed. GDIUsage helps you to detect and locate GDI object creations that are not released during the life of your application, but at its startup.
Displaying GDI Objects
The GDIUsage user interface, which you saw in Figure 1, allows the user to take a snapshot of the GDI object that is being consumed by a process at any given point in time and compare it later with the current state of the same process. Each set of objects is stored in a CGdiResources object. As you have seen in the TM_GET_LIST explanation, once the objects list has been serialized and marshaled between the debuggee and the debugger through the memory mapped file, a CGdiResources object is initialized using CreateFromList.
In the Windows 9x version of GDIUsage, the class CGdiResourcesDlg, responsible for displaying GDI objects, takes a pointer to a CGdiResources object as a parameter. Unfortunately, in the Windows XP version of GDIUsage, a CGdiResourcesDlg is worthless within the context of the debugger since the GDI functions used to graphically display a GDI object created inside the debuggee will always fail.
The solution is to move CGdiResourceDlg and CGdiResources inside GDITrace.dll, which is loaded into the debuggee by the Windows hook, as discussed earlier. After retrieving the list of allocated objects, it is time to display such a list. The same communication mechanism as the one discussed earlier is used, but before the list can be displayed and before posting a message to the infiltrated thread, the list must be saved in the debugger context to the memory mapped file. The debugger serializes a CGdiResources list (either the current snapshot or the result of a compare, depending on the button clicked by the user) into the memory mapped file and posts a TM_SHOW_LIST message to the thread infiltrated into the debuggee with the GDIUsage main dialog window handle as a parameter. This treatment is carried out by CGDIDebugger::ShowList for the serialization and ShowRemoteList for the message posting.
Like TM_GET_LIST, the TM_SHOW_LIST message is handled by the infiltrated thread, but is next routed to CGDITraceApp::OnShowList. This method initializes a CGdiResources from the serialized and marshaled data stored in the memory mapped file. It is now possible to let a CGdiResourcesDlg display the corresponding GDI objects.
Unlike TM_GET_LIST message, the debugger does not need any return code or information for this display command posted to the debuggee. In any case, the user expects that GDIUsage stays disabled until he dismisses the CGdiResourcesDlg dialog. This is the reason why the handle of the GDIUsage main dialog is passed as a parent window to the CGdiResourcesDlg object.
The implementation of this class has not been changed from the Windows 9x version except for two enhancements. The first one uses the way a GDI handle is built to detect if it references a stock object or not and adds a * to the corresponding line in the listbox. This feature is invisible in GDIUsage since no API to create a stock object has been patched, but you might want to implement it when writing your own code.
The second enhancement is the need to detect when the user clicks or double-clicks a GDI object of the list in order to present the corresponding stack trace, as you saw in Figure 10. Since the CGdiResourcesDlg code should not be linked to the stack trace code, a generic callback interface, INotificationCallBack, has been defined. The CGDITraceApp class derives from it, implements OnDoubleClick, and registers itself using CGdiResourcesDlg::SetNotificationCallBack.
When the user double-clicks a GDI object, CGdiResourcesDlg checks if a callback has been registered. If so, it calls the OnDoubleClick method of the callback with the CGdiObj description corresponding to the double-clicked GDI object as a parameter. Then, CGDITraceApp extracts the call stack from the passed CGdiObj and instantiates a CCallStackDlg with it in order to display the stack trace to the user (see CGDITraceApp::OnDoubleClick in GDITrace.cpp for implementation details).
This feature has also been added to the GDIndicator tool that was shown in Figure 3. Unlike GDIUsage, the injection mechanism is based on CreateRemoteThread and has already been discussed in my August article. One interesting fact to note is that the serialization and display of the code is exactly the same as GDIUsage; only the remoting mechanism has changed. And since there is no recorded stack trace, no handler for double-click is registered.
There are two final pitfalls that must be dealt with. Since the dialog that displays GDI objects runs into another process context, some weird things happen. First, under Windows 2000 and Windows XP, it is not easy to set the foreground window from another process. You must call AllowSetForegroundWindow within the process that owns the foreground window in order to let another process set one of its windows as the new foreground window. Until the user cancels it, the dialog box code runs as an infinite loop in the context of the current thread of the other process. As a result, GDIndicator hangs until the dialog is dismissed. In order to avoid this unwelcome behavior, its main window is hidden during the lifetime of the dialog.
The second side effect is more difficult to work around. The windows created in the thread taken over by the dialog will no longer receive all their messages because the dialog procedure filters them. For example, if you use the tools outlined here to find the GDI objects that are allocated by Notepad, its repainting works fine and you can easily navigate into its menus, but the selected commands won't trigger until after the dialog is dismissed. This is a tricky feature to use but it is really valuable when you're hunting down elusive GDI leaks.
Conclusion
Between Windows 9x and Windows XP, things have changed in the GDI. Even though a lot of code in my program has been reused from its Windows 9x implementation, such as the management of a set of GDI objects and their graphical display, the inner mechanisms to get the list of GDI objects allocated by a process is new. The Windows XP version is based on the Win32 debugging API, DLL injection, Windows hooks, and API patching and provides more features than its Windows 9x counterpart.
In addition to the list of calls leading to each GDI allocation, a final touch has been added. When the remote process ends, the ExitInstance method of the injected DLL gets called. CGDITraceApp takes advantage of this notification to make a last enumeration of the GDI objects and to dump the ones that are still valid. This is the same kind of final dump you get for your MFC application when you use DEBUG_NEW as your memory allocator in debug builds in order to detect memory leaks.
These mechanisms can be used to find any other type of leak, assuming that you know the API functions that are called to create the resources in which you are interested. For example, kernel objects leaks (such as files and registry keys) might be found using existing tools such as oh.exe and dh.exe from the Platform SDK or ProcessExplorer from https://www.sysinternals.com. Even though they provide the list of kernel objects used by any process (and comparisons for ProcessExplorer), you could take advantage of the call stack and final leaks dump offered by the techniques presented in this article to track down your system leaks more easily.
For related articles see:
Windows XP Escape from DLL Hell with Custom Debugging and Instrumentation Tools and Utilities
Windows XP: Escape from DLL Hell with Custom Debugging and Instrumentation Tools and Utilities, Part 2
For background information see:
Windows Graphics Programming: Win32 GDI and DirectDraw by Feng Huan (Prentice Hall PTR, 2000)
Debugging Applications by John Robbins (Microsoft Press, 2002)
Programming Applications for Microsoft Windows by Jeffrey Richter (Microsoft Press, 1999)
Christophe Nasarreis a development manager for Business Objects in France. He has written several low-level tools for Windows since version 3.0. You can reach Christophe at cnasarre@montataire.net.