Debugging a classic ReaderWriterLock deadlock with SOSex.dll
I was helping out on an issue the other day where the process would stall if they added enough users in their load tests. Btw, serious kudos to them for making load tests, so much nicer to work with a problem in test rather than when it is getting critical on a production machine.
We gathered some memory dumps with debug diag of the asp.net process (w3wp.exe) and found that most of the threads were waiting in this type of callstack:
NOTE: I have changed a lot of function names and code snippets since it is not relevant for the post.
0:071> !clrstack
OS Thread Id: 0x21ac (71)
ESP EIP
1c8ceb88 7c8285ec [HelperMethodFrame_1OBJ: 1c8ceb88] System.Threading.ReaderWriterLock.AcquireReaderLockInternal(Int32)
1c8cebe4 1c880d72 DataStore.get_SomeItem()
...
So they were sitting in DataStore.get_SomeItem, waiting to acquire a ReaderLock for some resource. A ReaderWriterLock is used to synchronize access to resources, where you want to allow multiple readers at the same time, or one writer.
To be honest, I can’t say that I see it used all that often, and now there is also a ReaderWriterLockSlim that is similar to the ReaderLock but designed to avoid some of the more common deadlocks.
Sosex from Steve’s Techspot has some really nice commands for reader writer locks that both display information about the RWlocks as well as detects deadlocks between them.
First we have !rwlock… this is similar to !syncblk for locks/monitors and here we can get some basic information about the readerwriterlocks in the process.
0:174> !rwlock
Address ReaderCount WaitingReaderCount WriterThread WaitingWriterCount
---------------------------------------------------------------------------------------
…
0e9167d0 0 0 50 2
0e9167fc 2 45 0 1
We have two interesting locks here (0e9167d0, and 0e9167fc)
For 0e9167d0 (I’ll call this Lock1 in the future), thread 50 owns the WriterLock so no-one else can acquire neither a ReaderLock nor WriterLock for this, and we have 2 threads waiting to acquire the WriterLock on this.
For 0e9167fc (I’ll call this Lock2 in the future), we have two threads that own ReaderLocks on this, we have 1 thread waiting to acquire a WriterLock on it, and 45 other threads waiting to acquire a ReaderLock on it. The thread waiting to acquire the WriterLock is probably first in queue, but until the current readers give up their locks, the WriterLock can’t be acquired.
(I will get back to these thread numbers in a moment because it gets pretty tricky)
We can also dig into a particular lock by running !rwlock <lock address>
0:068> !rwlock 0e9167d0 (<- Lock1)
WriterThread: 50
WriterLevel: 1
WaitingWriterCount: 2
WriterEvent: e54
WaitingWriterThreadIds: 14,47
ReaderCount: 0
CurrentReaderThreadIds: None
WaitingReaderCount: 0
ReaderEvent: 0
WaitingReaderThreadIds: None
0:174> !rwlock 0e9167fc (<-Lock2)
WriterThread: 0
WriterLevel: 0
WaitingWriterCount: 1
WriterEvent: e00
WaitingWriterThreadIds: 50
ReaderCount: 2
CurrentReaderThreadIds: 14,47
WaitingReaderCount: 45
ReaderEvent: e7c
WaitingReaderThreadIds: 54,57,58,61,62,66,67,70,71,74,75,78,79,82,83,86,87,90,91,94,95,98,99,102,103,106,107,110,111,114,115,118,119,122,123,126,127,130,131,134,137,140,143,146,149
With this info, we can now gather that threads 14 and 47 own readerlocks for Lock2, but they won’t give them up until they can get WriterLocks for Lock1. Thread 50 on the other hand, has a WriterLock on Lock1 and wont give it up until it can get the WriterLock on Lock2, hence we have a deadlock.
The good thing here is that if you don’t care to do this investigation on your own, SOSEX has a function that will do it for you called !dlk (deadlock).
0:174> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLocks...
Scanning for lock holders on ReaderWriterLocks...
Scanning for threads waiting on SyncBlocks...
Scanning for threads waiting on ReaderWriterLocks...
Deadlock detected:
CLR thread 14 holds a Reader lock on ReaderWriterLock 0e9167fc
is waiting for a Writer lock on ReaderWriterLock 0e9167d0
CLR thread 50 holds a Writer lock on ReaderWriterLock 0e9167d0
is waiting for a Writer lock on ReaderWriterLock 0e9167fc
CLR Thread 14 is waiting at DataStore.InitializeDataIfNotInitialized()(+0x0 IL)(+0x0 Native)
CLR Thread 50 is waiting at System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32)(+0x0 IL)(+0x0 Native)
Deadlock detected:
CLR thread 47 holds a Reader lock on ReaderWriterLock 0e9167fc
is waiting for a Writer lock on ReaderWriterLock 0e9167d0
CLR thread 50 holds a Writer lock on ReaderWriterLock 0e9167d0
is waiting for a Writer lock on ReaderWriterLock 0e9167fc
CLR Thread 47 is waiting at DataStore.InitializeDataIfNotInitialized()(+0x0 IL)(+0x0 Native)
CLR Thread 50 is waiting at System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32)(+0x0 IL)(+0x0 Native)
2 deadlocks detected.
Explanation of the Thread IDs coming from the SOSEX RWLock methods
So this is all nice and good, but what are these thread IDs? is it the logical thread ID (1st column in !threads), the CLR ID (2nd column in !threads) or the OSID (OS Thread ID)? Turns out it is neither… or rather it is actually the CLR ID in decimal form:) Steve mentioned he might change this later to be the hex value so stay tuned for new updates of SOSEX for that and other functionality.
0:054> !threads
ThreadCount: 154
UnstartedThread: 1
BackgroundThread: 103
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
16 1 1474 0010c0a8 3808220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Worker)
...
32 d 2520 1abd0948 200b020 Enabled 00000000:00000000 0013fa58 0 MTA
11 e 1fcc 1abdc9e8 a80a220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Completion Port)
33 f 1224 1abe5c10 380b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Worker)
...
37 13 1cc8 1ac22618 380b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Worker)
38 14 23e8 1ac229e8 200b020 Enabled 00000000:00000000 0013fa58 0 MTA
39 15 2230 1ac22db8 380b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Worker)
...
63 2d 23e0 1ab934c0 380b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Worker)
64 2e 19ac 1ac3d190 200b020 Enabled 00000000:00000000 0013fa58 0 MTA
65 2f 1644 1aba8f40 a80b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Completion Port)
66 30 10f4 1abaaaa8 380b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Worker)
67 31 121c 1ac47418 200b020 Enabled 00000000:00000000 0013fa58 0 MTA
68 32 1b00 1abfdd70 a80b220 Enabled 00000000:00000000 0013fa58 1 MTA (Threadpool Completion Port)
69 33 2198 1ac55358 200b220 Enabled 06e140e4:06e150b4 0013fa58 0 MTA
In our case, the threads we were interested in were 14 (0xe) = 11, 47 (0x2f) = 65 and 50 (0x32) = 68
If we take a look at these threads (11 and 68), the stacks look like this:
0:011> !clrstack
OS Thread Id: 0x1fcc (11)
ESP EIP
01c7f250 7c8285ec [HelperMethodFrame_1OBJ: 01c7f250] System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32)
01c7f2ac 1c880e07 DataStore.InitializeDataIfNotInitialized()
01c7f2e4 1c880d78 DataStore.get_SomeItem()
...
0:068> !clrstack
OS Thread Id: 0x1b00 (68)
ESP EIP
1c71ed0c 7c8285ec [GCFrame: 1c71ed0c]
1c71ed28 7c8285ec [HelperMethodFrame_1OBJ: 1c71ed28] System.Threading.ReaderWriterLock.FCallUpgradeToWriterLock(System.Threading.LockCookie ByRef, Int32)
1c71ed88 792af05d System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32)
1c71edac 1c880e19 DataStore.InitializeDataIfNotInitialized()
1c71ede4 1c880d78 DataStore.get_SomeItem()
...
And the code for the get_SomeItem, and InitializeDataIfNotInitialized methods look something like this:
public static SomeItem get_SomeItem()
{
...
DataLock.AcquireReaderLock(timeOut);
try
{
InitializeDataIfNotInitialized();
...
}
finally
{
DataLock.ReleaseReaderLock();
}
...
}
private static void InitializeDataIfNotInitialized()
{
if (DataIsNotInitialized)
{
try
{
InitLock.AcquireWriterLock(timeOut);
LockCookie lockCookie = DataLock.UpgradeToWriterLock(timeOut);
try
{
...
}
finally
{
DataLock.DowngradeFromWriterLock(ref lockCookie);
}
}
finally
{
InitLock.ReleaseWriterLock();
}
}
}
So what happened here was that thread 11 and 68 both grabbed the DataLock ReaderLock in get_SomeItem.
Then 68 grabbed the WriterLock for InitLock in InitializeDataIfNotInitialized, and now it is trying to upgrade the DataLock to a WriterLock but can’t do so since thread 11 has the ReaderLock on it.
Thread 11 on the other hand now needs to grab the WriterLock on the InitLock but can’t do so because 68 owns it.
Conclusion
This is a pretty typical deadlock situation, and the solution is to re-design the initialization methods so that this can never happen, but hopefully the walkthrough has given some insight into how you can debug these types of issues with SOSEX if you do run into them.
Have a good one,
Tess
Comments
Anonymous
April 27, 2010
Thank you for the write up. It was Neatly Explained, Simple and short to the point. RegardsAnonymous
April 26, 2011
I'm wondering why didn't the 'timeout' work.Anonymous
July 23, 2012
Hi, i try to investigate a hung with windbg. If I call the command !mlocks i got the following :000> !mlocks Examining SyncBlocks... Scanning for ReaderWriterLock instances... Scanning for holders of ReaderWriterLock locks... Scanning for ReaderWriterLockSlim instances... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... ClrThread DbgThread OsThread LockType Lock LockLevel
0x640064 -1 0xffffffff RWLock 000000000339a338 Writer 0x6 7 0x1ea8 thinlock 000000000343ddd8 (recursion:0) When executeing rwlocks i got the following: 0:000> !rwlocks 000000000339a338 No export rwlocks found 0:000> !rwlock 000000000339a338 WriterThread: 0x640064 (DEAD) WriterLevel: 115 WaitingWriterCount: 0 WriterEvent: 0 WaitingWriterThreadIds: None ReaderCount: 116 CurrentReaderThreadIds: WaitingReaderCount: 576 ReaderEvent: 80400002 WaitingReaderThreadIds: *This lock has 116 orphaned reader locks. 0:007> !rwlock Address ReaderCount WaitingReaderCount WriterThread WaitingWriterCount ... 000000000339a338 116 576 0x640064 0 ... 00000000053f0688 568 499 -- 6 ... i got this. When I call 0:000> !dlk Examining SyncBlocks... Scanning for ReaderWriterLock instances... Scanning for holders of ReaderWriterLock locks... Scanning for ReaderWriterLockSlim instances... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... Scanning for threads waiting on SyncBlocks... Scanning for threads waiting on ReaderWriterLock locks... Scanning for threads waiting on ReaderWriterLocksSlim locks... Scanning for threads waiting on CriticalSections... No deadlocks detected. then no deadlock will be detected. My question is if this threads have to do with my hunging application, and what the scenario could be. What does it mean that the thread is DEAD. Or do I have to find somewhere else to find out the root cause of my hunging application? Please help me to interpret this output. Regards, Bernhard