Troubleshooting ReaderWriterLockSlim issues
One wouldn't think there would be anything to troubleshoot but we somehow keep having problems so I thought I'd share how we debug these issues:
Imagine a situation where you notice your application not making progress - you take a dump and see something like this:
0:1199> !ClrStack OS Thread Id: 0x294c (1199) Call Site [HelperMethodFrame_1OBJ: 0000001b1667de98] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean) System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean) System.Threading.WaitHandle.WaitOne(Int32, Boolean) System.Threading.ReaderWriterLockSlim.WaitOnEvent(System.Threading.EventWaitHandle, UInt32 ByRef, TimeoutTracker) System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker) System.Threading.ReaderWriterLockSlim.TryEnterReadLock(TimeoutTracker) [...]
(let's also assume most of the other threads are in this state) - let's take a closer look:
0:1199> !dso OS Thread Id: 0x294c (1199) RSP/REG Object Name 0000001B1667DDE0 000001b2701f7810 Microsoft.Win32.SafeHandles.SafeWaitHandle 0000001B1667DEE0 000001b2701f77e0 System.Threading.ManualResetEvent 0000001B1667DF88 000001b82c2f6bf0 System.Threading.ReaderWriterCount 0000001B1667E060 000001b220284198 System.Threading.ReaderWriterLockSlim [...]
Let's check out this ReaderWriterLockSlim object:
0:1199> !do 000001b220284198 Name: System.Threading.ReaderWriterLockSlim MethodTable: 00007ffab8dac760 EEClass: 00007ffab8ad3da0 Size: 96(0x60) bytes Fields: MT Field Offset Type VT Attr Value Name 00007ffabc3d24b8 40001d2 50 System.Boolean 1 instance 1 fIsReentrant 00007ffabc403e98 40001d3 30 System.Int32 1 instance 0 myLock 00007ffabc3fc9a0 40001d7 34 System.UInt32 1 instance 1 numWriteWaiters 00007ffabc3fc9a0 40001d8 38 System.UInt32 1 instance 1340 numReadWaiters 00007ffabc3fc9a0 40001d9 3c System.UInt32 1 instance 0 numWriteUpgradeWaiters 00007ffabc3fc9a0 40001da 40 System.UInt32 1 instance 0 numUpgradeWaiters 00007ffabc3d24b8 40001db 51 System.Boolean 1 instance 0 fNoWaiters 00007ffabc403e98 40001dc 44 System.Int32 1 instance -1 upgradeLockOwnerId 00007ffabc403e98 40001dd 48 System.Int32 1 instance -1 writeLockOwnerId 00007ffabc4086c8 40001de 8 ...g.EventWaitHandle 0 instance 000001b183d85990 writeEvent 00007ffabc4086c8 40001df 10 ...g.EventWaitHandle 0 instance 000001b2701f77e0 readEvent 00007ffabc4086c8 40001e0 18 ...g.EventWaitHandle 0 instance 0000000000000000 upgradeEvent 00007ffabc4086c8 40001e1 20 ...g.EventWaitHandle 0 instance 0000000000000000 waitUpgradeEvent 00007ffabc3fb628 40001e3 28 System.Int64 1 instance 1951 lockID 00007ffabc3d24b8 40001e5 52 System.Boolean 1 instance 0 fUpgradeThreadHoldingRead 00007ffabc3fc9a0 40001e7 4c System.UInt32 1 instance 1073741825 owners 00007ffabc3d24b8 40001ed 53 System.Boolean 1 instance 0 fDisposed 00007ffabc3fb628 40001e2 438 System.Int64 1 shared static s_nextLockID >> Domain:Value 000001af5f46feb0:NotInit 000001b473428820:NotInit << 00007ffab8dab368 40001e4 0 ...ReaderWriterCount 0 shared TLstatic t_rwc >> Thread:Value <<
That's very interesting - if you're not familiar with the ReaderWriterLockSlim, you can see the source code here. Like all locks, ReaderWriterLockSlim is optimized for the happy case (i.e. the lock is free). In our case, there are 1340 waiting readers and 1 waiting writer - but what's the state of the lock?
0:1199> ? 0n1073741825 Evaluate expression: 1073741825 = 00000000`40000001
Which means: Writer(s) waiting, one reader currently in the lock.
Here's the decoder ring (from the source code linked above):
//Various R/W masks //Note: //The Uint is divided as follows: // //Writer-Owned Waiting-Writers Waiting Upgraders Num-Readers // 31 30 29 28.......0 // private const uint WRITER_HELD = 0x80000000; private const uint WAITING_WRITERS = 0x40000000; private const uint WAITING_UPGRADER = 0x20000000;
So, we have a reader that's not coming out - but which one? For that, we need to understand how the ReaderWriterLockSlim keeps track of the lock holders - again, it's optimized for the happy case:
private ReaderWriterCount GetThreadRWCount(bool dontAllocate) { ReaderWriterCount rwc = t_rwc; ReaderWriterCount empty = null; while (rwc != null) { if (rwc.lockID == this.lockID) return rwc; [...]
In summary, there's a thread local (no contention!) linked list of items (for all ReaderWriterLockSlim's), identified by the lockID of the lock you're holding. With that information, we can find the lone reader:
Here's what a ReaderWriterCount looks like:
0:1199> !do 000001b82c2f6bf0 Name: System.Threading.ReaderWriterCount MethodTable: 00007ffab8dab368 EEClass: 00007ffab8ad3980 Size: 48(0x30) bytes File: D:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll Fields: MT Field Offset Type VT Attr Value Name 00007ffabc3fb628 40001cd 10 System.Int64 1 instance 1951 lockID 00007ffabc403e98 40001ce 18 System.Int32 1 instance 0 readercount 00007ffabc403e98 40001cf 1c System.Int32 1 instance 0 writercount 00007ffabc403e98 40001d0 20 System.Int32 1 instance 0 upgradecount 00007ffab8dab368 40001d1 8 ...ReaderWriterCount 0 instance 000001b82c2f4b60 next
So, the idea is to dump all of the instances of that class that have the right lockID and the readerCount set to 1 - I don't know about you, but I don't like to grovel through thousands of objects, I'd rather spend the time writing a command I can reuse:
0:1199> .foreach (rwc {!DumpHeap -MT 00007ffab8dab368 -short}){.if ((qwo(${rwc}+10) == 0n1951) and (dwo(${rwc}+0x18) > 0)) {!do rwc}}
Which returns...nothing... (the command above basically says: foreach instance of ReaderWriterCount, if the lockId is 1951 and the readerCount is greater than 0, dump the object) - that means the lock was orphaned - i.e. the thread that called EnterReaderLock exited without ever calling Leave :-(
Note that sosex figures that out without resorting to arcane debugger commands:
0:1199> !rwlock 000001b220284198 WriteLockOwnerThread: None UpgradableReadLockOwnerThread: None ReaderCount: 1 ReaderThreadIds: WaitingReaderCount: 1340 WaitingReaderThreadIds: 0x6b,0xdc,0x134,0x113,0x12d,0x193,0xcc,0x100,0x14[...] WaitingWriterCount: 1340 WaitingWriterThreadIds: 0x7d WaitingUpgradableReaderCount: 0 WaitingUpgradableReaderThreadIds: None WaitingWriterUpgradeCount: 0 WaitingWriterUpgradeThreadIds: None *This lock has 1 orphaned reader lock.
Unfortunately, there isn't much more we can tell at this point - but if we had found one, we could just gcroot on it to see which thread owned it:
0:1199> !gcroot 000001b82c2f6bf0 Thread 294c: 0000001b1667e080 00007ffab92d932c System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker) [f:\dd\ndp\fx\src\Core\system\threading\ReaderWriterLockSlim\ReaderWriterLockSlim.cs @ 305] rbx: (interior) -> 000001b82c2f4b38 System.Object[] -> 000001b82c2f6bf0 System.Threading.ReaderWriterCount 0000001b1667e080 00007ffab92d932c System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker) [f:\dd\ndp\fx\src\Core\system\threading\ReaderWriterLockSlim\ReaderWriterLockSlim.cs @ 305] r14: -> 000001b82c2f6bf0 System.Threading.ReaderWriterCount
I hope this was useful - let me know in the comment sections if you want to see more of this kind of stuff.