The Designer Process That Would Not Terminate (Part 2)

In my previous post (https://blogs.msdn.com/geoffda/archive/2007/08/31/the-designer-process-that-would-not-terminate.aspx) I talked about an issue where RCWs (Runtime Callable Wrappers) awaiting garbage collection were holding references to a COM object and preventing it from being deterministically shut down.  In this post, we'll continue the discussion--this time focusing on how to use the debugger to track down such a problem.

To facilitate this blog entry, I started to code up a simple example of the problem.  What I discovered is that the insidious nature of RCW leaks can be reproduced in code that is even simpler than I had originally intended to write.  Hopefully, this example will hammer home just how careful you have to be when using RCWs.

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;

namespace RCWLeak
{
    class Program
    {
        static void Main(string[] args)
        {
            ApplicationClass excel = new ApplicationClass();
            Workbook workbook = excel.Workbooks.Add(Type.Missing);
            Marshal.FinalReleaseComObject(workbook);
            excel.Quit();
            Marshal.FinalReleaseComObject(excel);

            Console.ReadKey();
        }
    }
}

So here is a very simple routine that does nothing more than start Excel, add a workbook, and then shut down.  It looks correct right?  Unfortunately it is not.  When we run this code, execution pauses at the end and waits for input.  What we can see by viewing task manager (or tasklist) that Excel is still running despite the fact that we thought we had taken all of the necessary steps to ensure a clean shut down.

Clearly there is an RCW that is still holding a reference to Excel somewhere, but it is non-obvious from looking at the code.  To get to the bottom of this we'll need to take a look at the managed heap.  Fortunately, we can do this with the SOS ("Son Of Strike") debugging extension.  If you have installed WinDBG, you already have SOS.  If you haven't installed, WinDBG, you can get it free from Microsoft here: https://www.microsoft.com/whdc/devtools/debugging/default.mspx

If you are familiar with WinDBG, you are probably aware that it has an open interface that allows for the development of custom, pluggable debugging tools to extend the base debugging functionality.  As I mentioned, SOS is one such extension.  To load SOS in WinDBG, you would type the following in the command window:

.loadby sos mscorwks

Once SOS is loaded, you can see what it does by typing "!help".  To get help for individual commands, you can simply type "!help <command>".

What we want to do is figure out what is holding on to Excel.  To start with, we can find out all of the objects on the managed heap by using the !dumpheap command. 

0:005> !dumpheap -stat
total 6901 objects
Statistics:
      MT    Count    TotalSize Class Name
79132f9c        1           12 System.Collections.Generic.GenericEqualityComparer`1[[System.String, mscorlib]]
79116b3c        1           12 System.Security.Permissions.ReflectionPermission
79116a1c        1           12 System.Security.Permissions.FileDialogPermission
79116998        1           12 System.Security.PolicyManager
791135c8        1           12 System.Resources.FastResourceComparer
7910a9e8        1           12 System.RuntimeTypeHandle
7910746c        1           12 System.DBNull
791073ac        1           12 System.Empty
7910556c        1           12 System.__Filters
7910551c        1           12 System.Reflection.Missing
791045e4        1           12 System.RuntimeType+TypeCacheQueue
7912ed84        1           16 System.SByte[]
7912d8dc        1           16 System.Collections.ObjectModel.ReadOnlyCollection`1[[System.Reflection.CustomAttributeData, mscorlib]]
7912d274        1           16 System.Collections.ObjectModel.ReadOnlyCollection`1[[System.Reflection.CustomAttributeTypedArgument, mscorlib]]
7911baf8        1           16 System.Enum+HashEntry
79113ea0        1           16 System.Security.Permissions.FileIOAccess
79113744        1           16 System.Resources.ResourceReader+TypeLimitingDeserializationBinder
79112510        1           16 System.Globalization.GlobalizationAssembly
791106e4        1           16 System.Security.Permissions.UIPermission
79108934        1           16 System.Reflection.Cache.InternalCache
790fa4f8        1           16 System.__ComObject
008cdbfc        1           16 Microsoft.Office.Interop.Excel.WorkbookClass
008c5e84        1           16 Microsoft.Office.Interop.Excel.ApplicationClass

...

790fcb30     2466       217980 System.String
Total 6901 objects

The "stat" flag limits the output to just the objects themselves--very handy.  If we put our cursor on the first line of the output, we can use the find window to search for "Microsoft.Office.Interop".  This shows us the typed RCWs that are on the heap.  However, all we see are:

008cdbfc        1           16 Microsoft.Office.Interop.Excel.WorkbookClass
008c5e84        1           16 Microsoft.Office.Interop.Excel.ApplicationClass

We know from checking our code that we are calling Marshal.FinalReleaseComObject on these, so they can't be the problem.  (As a side note, if we wanted to dump these objects, we could call !dumpheap -mt <method table> and it would give us the address that we could then use to call !dumpobj.)

What else could be going on?  Well, we have accounted for all of the typed RCWs, but we haven't checked for System.__ComObject.  To look for a specific type, we can make use of the !dumpheap type flag:

0:005> !dumpheap -type System.__ComObject
 Address       MT     Size
02893294 790fa4f8       16    
total 1 objects
Statistics:
      MT    Count    TotalSize Class Name
790fa4f8        1           16 System.__ComObject
Total 1 objects

Aha!  This must be the RCW that is causing problems.  Now we just need to figure out where it is coming from.  Let's start by trying to figure out who owns it.  We'll do that by looking for the root using the !gcroot command:

0:005> !gcroot 02893294
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread f48
Scan Thread 2 OSTHread c3c

Uh oh, the object is already eligible for collection.  We know this because there are no roots shown.  It looks like we need to start debugging earlier when this object is still alive.  The problem is that managed source level debugging doesn't work very well in WinDBG at the moment.

What to do?  Take advantage of one of the best kept secrets out there: the Visual Studio 2005 debugger can load debugger extensions.  To load SOS in the Visual Studio Debugger, enable native debugging in your project, start debugging and then simply go to the immediate window and type ".load sos".  Subsequent SOS commands can be entered in the immediate window exactly as they would be entered in the WinDBG command window.

So we'll open our project in Visual Studio, turn on native debugging, hit F10 (single step) and load SOS:

.load sos

extension C:\Windows\Microsoft.NET\Framework\v2.0.50727\sos.dll loaded

Now that we have SOS loaded, we'll try setting a breakpoint on the first Marshal.FinalReleaseComObject call. After running to that location, we'll take a look at the ComObject again to see if we've managed to catch it while it is still rooted.

!dumpheap -type System.__ComObject

Address MT Size

02a83294 790fa4f8 16

02a83494 790fa4f8 16

total 2 objects

Statistics:

MT Count TotalSize Class Name

790fa4f8 2 32 System.__ComObject

Total 2 objects

Well, we see two ComObjects now. Lets find out who they belong to.

!gcroot 02a83294

Note: Roots found on stacks may be false positives. Run "!help gcroot" for

more info.

Error during command: Warning. Extension is using a callback which Visual Studio does not implement.

Scan Thread 5988 OSTHread 1764

Scan Thread 3964 OSTHread f7c

!gcroot 02a83494

Note: Roots found on stacks may be false positives. Run "!help gcroot" for

more info.

Error during command: Warning. Extension is using a callback which Visual Studio does not implement.

Scan Thread 5988 OSTHread 1764

Scan Thread 3964 OSTHread f7c

DOMAIN(004514F0):HANDLE(Strong):18118c:Root:02a83494(System.__ComObject)

The second ComObject is rooted to a GCHandle, which means that it was probably created by some internal implementation and we can ignore it for now. The first ComObject is not rooted--which means that it is probably the one we are looking for, but we still don't know who is allocating it. To find out, we'll try stopping at each line of code and dumping ComObjects.

What we discover with this approach is that the ComObject does not appear until we step over the following line of code. 

Workbook

workbook = excel.Workbooks.Add(Type.Missing);

However, it is still unrooted. That can only mean one thing--this line of code is creating an RCW on the heap but not referencing it. Looking carefully at the code, we finally see what we have been missing. The Workbooks accessor (of the ApplicationClass) creates a Workbooks interface instance and returns it. Since we don't assign the return value to anything, there is nothing to reference it beyond the scope of the call. Presumably, the object backing the Workbooks interface does't implement IProvideClassInfo, so the RCW is created as a System.__ComObject.

We can actually confirm this with a little creative stepping in the dissembly window:

Workbook workbook = excel.Workbooks.Add(Type.Missing);

0000003e mov ecx,esi

00000040 mov eax,dword ptr [ecx]

00000042 call dword ptr [eax+000000F8h]

00000048 mov edi,eax

After stepping over the first call (which invokes the Workbooks accessor), we dump the ComObjects again:

!dumpheap -type System.__ComObject

Address MT Size

02763294 790fa4f8 16

027632a4 790fa4f8 16

027632b4 790fa4f8 16

total 3 objects

Statistics:

MT Count TotalSize Class Name

790fa4f8 3 48 System.__ComObject

Total 3 objects

The last one proves our hypothesis:

!gcroot 027632b4

Note: Roots found on stacks may be false positives. Run "!help gcroot" for

more info.

Error during command: Warning. Extension is using a callback which Visual Studio does not implement.

Scan Thread 3924 OSTHread f54

ESP:47ef64:Root:027632b4(System.__ComObject)

Scan Thread 4672 OSTHread 1240

 What we see is that the ComObject is currently rooted to the stack.  Now after we step over the remainder of the code for that line, we can dump the heap again and we will see that object has lost its root (which is what we expect).

!gcroot 027632b4

Note: Roots found on stacks may be false positives. Run "!help gcroot" for

more info.

Error during command: Warning. Extension is using a callback which Visual Studio does not implement.

Scan Thread 3924 OSTHread f54

Scan Thread 4672 OSTHread 1240

In order to effect a clean shutdown, it would seem that we need to refactor our code so that we can call Marshal.FinalReleaseComObject on the Workbooks object. We'll need to assign the return value of the Workbooks accessor to a variable. Here is the corrected code:

        static void Main(string[] args)
        {
            ApplicationClass excel = new ApplicationClass();
            Workbooks workbooks = excel.Workbooks;
            Workbook workbook = workbooks.Add(Type.Missing);
            Marshal.FinalReleaseComObject(workbook);
            Marshal.FinalReleaseComObject(workbooks);
            excel.Quit();  
            Marshal.FinalReleaseComObject(excel);
           
            Console.ReadKey();
        }

Upon running it, we see that we have solved the problem and Excel now shuts down cleanly when we expect it to.

Hopefully, this demonstration has reinforced the perils of RCW usage.  When writing interop code, it is important to avoid calls that tunnel into the object model because they will orphan RCWs on the heap that you will not be able to access in order to call Marshal.ReleaseComObject.