Identifying the root of a memory leak in Silverlight using windbg
Considering all of the nasty blog comments out there you would think it be easy to create a test project to reproduce a leak – it wasn’t. I downloaded the latest version of SL4 and first attempted to create a leak using the inline template, then control template, then listbox items – all of these prior issues now appear to be resolved as I could not reproduce a leak using any of them with Build 4.0.60310.0 (Released April 19, 2011). In reviewing the release history (which I should have done prior to attempting to reproduce these myself) I can see various leaks were already fixed: nested popup (Build 4.0.60310.0), Controls leak INotifyDataErrorInfo.ErrorsChanged, Datagrid leak (4.0.60310.0), Inline data template leak (4.0.60129.0), Mousecapture & usercontrol and various leaks (4.0.50826.0).
So I ended up creating a more mundane simple leak that may occur if you register an event-handler somewhere such as in your view to an object that remains alive (such as a shared model) - and never unregister your event-handler and keep the event publisher instance alive. If you have something more exciting in a simple demo project please email me. First I’ll show you this with windbg (free) and then I’ll show you in ANTS (my new favorite but not free debugging tool). In either case regardless of the tool it helps to have an idea where the leak(s) may be in advance. In other words – the tools don’t know the software’s intention. If you intend to have something collected then you can use these techniques to test to see if they are indeed collected or not. You can also check the number of instances and size of objects and inspect those in Gen 2 and LOH as prime suspects but this walkthrough doesn’t focus on that technique – just touches on it.
Step 1
-
- Download the leaky sample zip file, unzip and open the VS 2010 project, set web project as startup and F5 debug to build and run it once. Copy the localhost url (including port) to notepad, and stop debugging in VS – we will be using windbg for this exercise.
- Open that url directly in IE (you may want to close other instances of IE except for these instructions).
- Example: https://localhost:21589/LeakyExamplesSLTestPage.aspx#/Home
- Click “eventpinned”, then click back to home, then click “Force a GC”.
Step 2
- If you haven’t already, install ProcessExplorer (free from Microsoft’s sysinternals). Use the target symbol from the toolbar – drag it onto your executing SL app to find and note the correct process ID. Also check that process “Image Type” column in ProcessExplorer to determine if it is 32 or 64 bit. If you don’t see that column in ProcessExplorer – right-click the column header and add it.
Step 3
- Open the version of debugdiag that corresponds to your process Image Type (eg DebugDiag (x86) for 32bit). (Run à debugdiag).
- Find your process by ID, right-click, “create full user dump”
Step 4 (enter windbg
- Open the corresponding version of Microsoft’s windbg (x86 for debugging your 32bit process) – right-click “Run as Administrator”. If you haven’t downloaded – go to windbg download – if you scroll to the bottom you will find direct links to both versions.
- Open the dump file you just created (File | Open Crash Dump)
- Now let’s load the Silverlight debugging extensions (sos) and the silverlight coreclr.
0:000> .load C:\Program Files (x86)\Microsoft Silverlight\4.0.60531.0\mscordaccore.dll
0:000> .load C:\Program Files (x86)\Microsoft Silverlight\4.0.60531.0\sos.dll
4. Now let’s check the managed threads (remember though from previous session that your Silverlight thread may not show up in managed threads.. you may need to execute ~*kL):
0:000> !threads
ThreadCount: 3
UnstartedThread: 0
BackgroundThread: 3
PendingThread: 0
DeadThread: 0
Hosted Runtime: yes
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
4 1 4d90 08b47888 220 Enabled 0a858950:0a859fe8 08a10978 0 STA
29 2 578c 08a4f028 b220 Enabled 00000000:00000000 08a10978 0 MTA (Finalizer)
30 3 21c0 08a52ab0 1220 Enabled 00000000:00000000 08a10978 0 Ukn
- Okay – that confirms we’ve got sos loaded…. but since we are not at a breakpoint of anything interesting in the code we aren’t searching for the clrstack… so we don’t need to track down the thread at this time. Instead we care about what is held in memory. Let’s get a summary of what’s in the heap:
0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0a661018
generation 1 starts at 0x0a66100c
generation 2 starts at 0x0a661000
ephemeral segment allocation context: none
segment begin allocated size
0a660000 0a661000 0a859ff4 0x1f8ff4(2068468)
Large object heap starts at 0x0b661000
segment begin allocated size
0b660000 0b661000 0b668260 0x7260(29280)
Total Size: Size: 0x200254 (2097748) bytes.
------------------------------
GC Heap Size: Size: 0x200254 (2097748) bytes.
- Interesting – but not what we are looking for. If you had serious memory leak problems you would see here Gen2 and the LOH grow between two memory dump snapshots. Now let’s look at all the objects in the heap… remember we are checking to see if our view (EventPinnedPage) has been GC’ed or not…
0:000> !dumpheap –stat
Statistics:
MT Count TotalSize Class Name
7aa1a178 1 12 System.Windows.Hosting.ManagedRuntimeHost
7aa1941c 1 12 System.Windows.Browser.ManagedObjectInfo+ScriptMemberGroup
[ ---- RESULTS CUT OUT FOR BREVITY --- ]
794aff90 135 250928 System.Byte[]
794b1a50 8284 274040 System.Object[]
794c05e8 7939 365732 System.String
Total 45639 objects
- Well…. There’s a lot of instances in the heap. Typically the System.String are expected and not something to worry about – but I can’t find my “EventPinnedPage” in this huge list. Let’s try a shell search for it:
0:000> .shell -ci "!dumpheap -stat" find "Pinned"
084b2164 2 208 LeakyExamplesSL.Views.EventPinnedPage
.shell: Process exited
Bingo we found it and it’s still in memory – let’s take a look at it:
0:000> !dumpmt -md 084b2164 EEClass: 070c5c0c Module: 06dc49e0 Name: LeakyExamplesSL.Views.EventPinnedPage mdToken: 02000007 File: LeakyExamplesSL, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null BaseSize: 0x68 ComponentSize: 0x0 Slots in VTable: 67 Number of IFaces in IFaceMap: 4 -------------------------------------- MethodDesc Table Entry MethodDesc JIT Name 792ca980 790ca0e4 PreJIT System.Object.ToString() 792ca9a0 790ca0ec PreJIT System.Object.Equals(System.Object) 792caa10 790ca10c PreJIT System.Object.GetHashCode() 792caa20 790ca124 PreJIT System.Object.Finalize() 082c65a0 084b2118 JIT LeakyExamplesSL.Views.EventPinnedPage..ctor() 082c6780 084b210c JIT LeakyExamplesSL.Views.EventPinnedPage.InitializeComponent() [-- deleted for brevity --- ] 06dcca75 084b2120 NONE LeakyExamplesSL.Views.EventPinnedPage.commander_SomethingChanged(System.Object, System.EventArgs) 082c6ca0 084b212c JIT LeakyExamplesSL.Views.EventPinnedPage.EventPinnedPage_Unloaded(System.Object, System.Windows.RoutedEventArgs)
- Okay but to find out what is causing our instance(s) of this to get stuck in memory we need to get to the instances themselves first. This command will list all instances in memory of a type (and their size) and most importantly – their virtual address.
0:000> !dumpheap -type LeakyExamplesSL.Views.EventPinnedPage
Address MT Size
0a78b844 084b2164 104
0a7ffc30 084b2164 104
total 0 objects
Statistics:
MT Count TotalSize Class Name
084b2164 2 208 LeakyExamplesSL.Views.EventPinnedPage
Now lets take a look at one of those instances and find out what is causing it to stay around like that one pesky party guest that hangs out when he’s no longer welcome…
0:000> !gcroot 0a78b844
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 4 OSTHread 4d90
Scan Thread 29 OSTHread 578c
Scan Thread 30 OSTHread 21c0
DOMAIN(08A54278):HANDLE(Pinned):6e112f8:Root: 0b664260(System.Object[])->
0a78b8ac(LeakyExamplesSL.Common.MockCommandManager)->
0a801b40(System.EventHandler)->
0a801b28(System.Object[])->
0a78d954(System.EventHandler)->
0a78b844(LeakyExamplesSL.Views.EventPinnedPage)
Well – that’s a pretty clear picture that the event handler to MockCommandManager is causing the problem. I must say that at this point Red Gates ANT memory profiler does a wonderful diagram (copied below). After playing around with that tool for a while I may just shell out the $500 out of pocket to use it.