.NET Matters
Deadlock monitor
Stephen Toub
Code download available at:NETMatters2007_10.exe(156 KB)
Q I'm using locks in my application to synchronize work on a bunch of threads. Unfortunately, I'm doing something incorrectly and my threads seem to just stop sometimes. I think I'm running into deadlocks, but I'm not sure how to find them. Is there any way I can do so programmatically? I'd like an exception to be thrown when a deadlock is encountered.
Q I'm using locks in my application to synchronize work on a bunch of threads. Unfortunately, I'm doing something incorrectly and my threads seem to just stop sometimes. I think I'm running into deadlocks, but I'm not sure how to find them. Is there any way I can do so programmatically? I'd like an exception to be thrown when a deadlock is encountered.
A First, it's important to understand what a deadlock among threads is and the conditions that lead to one. Threads deadlock when waiting on each other to release some resources, but by performing that blocking wait, they're not releasing the resources the other threads need in order to unblock. The threads can't make any progress until the resources are released, but because they're not making progress, the resources will never be released, the threads are locked up, and thus "deadlock." Many OS course textbooks will cite the four conditions necessary for a deadlock to occur:
A First, it's important to understand what a deadlock among threads is and the conditions that lead to one. Threads deadlock when waiting on each other to release some resources, but by performing that blocking wait, they're not releasing the resources the other threads need in order to unblock. The threads can't make any progress until the resources are released, but because they're not making progress, the resources will never be released, the threads are locked up, and thus "deadlock." Many OS course textbooks will cite the four conditions necessary for a deadlock to occur:
- A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
- The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
lock(a) { lock(b) { ... } }
- No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
- A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
If any one of these conditions is not met, deadlock is not possible. The first condition is inherent to what a monitor is, so if you're using monitors, this one is set in stone. The second condition could be avoided by ensuring that you only ever lock one object at a time, but that's frequently not a feasible requirement in a large software project. The third condition could possibly be avoided in the Microsoft® .NET Framework by aborting or interrupting the thread holding the resource your thread requires, but a) that would require knowing which thread owned the resource, and b) that's an inherently dangerous operation (for the many reasons why, see msdn.microsoft.com/msdnmag/issues/05/10/Reliability). Thus, the way to avoid deadlocks is to avoid (or thwart) condition four.
In his article in the April 2006 issue (available at msdn.microsoft.com/msdnmag/issues/06/04/Deadlocks), Joe Duffy discusses several techniques for avoiding and detecting deadlocks, including one known as lock leveling. In lock leveling, locks are assigned numerical values, and threads must only acquire locks that have higher numbers than locks they have already acquired. This prevents the possibility of a cycle. It's also frequently difficult to do well in a typical software application today, and a failure to follow lock leveling on every lock acquisition invites deadlock.
Rather than prevent deadlocks wholesale, many systems attempt to detect them and then eliminate them once found. For example, SQL Server® can detect when deadlocks occur and abort one of the tasks involved in the cycle, thereby removing the deadlock. In his article, Joe builds a common language runtime (CLR) host that is capable of this form of deadlock detection in .NET applications that use monitors, a very cool feat. Unfortunately, using a custom CLR host isn't always practical for many .NET apps, including those that already have a custom host, like ASP.NET applications. As such, it would be beneficial to be able to utilize similar deadlock detection capabilities, without the need for a custom CLR host, such that these types of deadlocks could be detected at run time. This would be beneficial during development and testing phases, to identify coding errors where locks are being used incorrectly. It could also be used in production to detect and eliminate deadlocks as they occur (preventing a thread from causing one by blocking it from attempting the critical wait that would complete a cycle), however typical deadlock detection algorithms are costly and may not be appropriate in production systems for performance reasons. (I'll have a few comments on performance at the end of this column.)
To address this need, I've built a sample wrapper for the .NET System.Threading.Monitor class that includes deadlock detection capabilities. As with Monitor, my DdMonitor class provides Enter and Exit methods, and under the covers it delegates to the equivalent methods on Monitor. However, it also tracks monitor usage and throws exceptions when the attempted acquisition of a lock will complete a cycle, which would result in deadlock. For the rest of this column, I'll detail the implementation of this deadlock monitor class and provide additional information on its usage as well as on the advantages and disadvantages of its deployment.
The outline for the DdMonitor class is shown in Figure 1. From the start, note that DdMonitor does not completely mimic the public interface of System.Threading.Monitor. It provides the same public static TryEnter, Enter, and Exit methods, but it does not provide the Wait, Pulse, and PulseAll static public methods of its counterpart.
Figure 1 DdMonitor Class
class DdMonitor {
public static IDisposable Lock(object monitor) {
...
}
public static void Enter(object monitor) {
...
}
public static bool TryEnter(object monitor) {
...
}
public static bool TryEnter(object monitor, TimeSpan timeout) {
...
}
public static bool TryEnter(object monitor, int millisecondsTimeout) {
...
}
public static void Exit(object monitor) {
...
}
}
DdMonitor also provides a public static Lock method. Since the C# lock keyword provides a nice abstraction over Monitor, it's beneficial to provide a similar one for DdMonitor. Without the ability to modify the C# language, it's impossible for us to obtain identical syntax, but we can get close:
// with Monitor
lock(obj) { ... }
// with DdMonitor
using(DdMonitor.Lock(obj)) { ... }
This syntactical feat is accomplished by implementing Lock, as shown in Figure 2. The Lock method delegates to DdMonitor.Enter to acquire the lock. However, it also instantiates a DdMonitorCookie object that implements IDisposable. This object's Dispose method will call DdMonitor.Exit to release the lock; this enables the developer to wrap DdMonitor.Lock in a C# using statement (Using in Visual Basic® or stack-allocation in C++/CLI) as shown previously.
Figure 2 Implementing the Lock Method
public static IDisposable Lock(object monitor) {
if (monitor == null) throw new ArgumentNullException("monitor");
IDisposable cookie = new DdMonitorCookie(monitor);
Enter(monitor);
return cookie;
}
private class DdMonitorCookie: IDisposable {
private object _monitor;
public DdMonitorCookie(object obj) {
_monitor = obj;
}
public void Dispose() {
if (_monitor != null) {
DdMonitor.Exit(_monitor);
_monitor = null;
}
}
}
The Enter method and two of the three TryEnter methods are simple wrappers for the third TryEnter method, as shown in Figure 3. These implementations are meant to follow the same specification as implemented by Monitor's Enter and TryEnter method. Calling Enter will block until the lock is acquired (we differ slightly by design in that our Enter method will throw an exception if a deadlock is detected, rather than blocking forever), and thus delegates to TryEnter with a Timeout.Infinite timeout. Note that Timeout.In- finite equals -1, which is a special value that signals TryEnter to block until lock acquisition, rather than failing out after the specific time. In contrast, the overload of TryEnter that doesn't accept a time value defaults to using a timeout of 0, meaning that it will return false if the lock cannot be acquired immediately (again, our implementation will also throw an exception if acquiring the lock would cause a deadlock; this decision is a debatable point with TryEnter, however, so you may choose to modify the implementation accordingly). Note that TryEnter will return true if the lock was acquired or false otherwise. The second overload of TryEnter simply converts the supplied TimeSpan timeout value into a number of milliseconds, validates the argument, and delegates to the final TryEnter overload, which is where all the real work happens.
Figure 3 Implementing Enter
public static void Enter(object monitor) {
TryEnter(monitor, Timeout.Infinite);
}
public static bool TryEnter(object monitor) {
return TryEnter(monitor, 0);
}
public static bool TryEnter(object monitor, TimeSpan timeout) {
long totalMilliseconds = (long) timeout.TotalMilliseconds;
if (totalMilliseconds < -1 || totalMilliseconds > Int32.MaxValue)
throw new ArgumentOutOfRangeException("timeout");
return TryEnter(monitor, (int) totalMilliseconds);
}
Our design is relatively straightforward, though it does involve a few interesting implementation details. The implementation begins by validating the supplied parameters, ensuring that an object was actually supplied on which to be locked and that a valid timeout was supplied (meaning either -1 or a non-zero number of milliseconds).
At this point, DdMonitor needs to access some shared state. Since there's no supported way for DdMonitor to interrogate the CLR for information about a monitor (including which thread may own it and which threads may be waiting on it), DdMonitor has to track this information manually. It does this by storing a static table of data structures that contain all of the relevant information about acquired monitors in the system, and every time an Enter or Exit operation is performed on a monitor, it updates this shared state.
Note that this has a few implications. First, while we'll see that DdMonitor does end up delegating to Monitor with the same monitor object that was supplied by the user, DdMonitor knows nothing about direct calls to Monitor's methods, and thus it's not able to update its internal state information based on any such calls that are made. This means that deadlock detection will not function properly if you mix usage of DdMonitor and Monitor (or the lock keyword) in your app, causing false negatives and possibly not preventing some deadlocks.
Another implication of this implementation is that this static table is AppDomain-specific. This shouldn't be a problem, since objects you lock on should also only live in one AppDomain. However, there are several special kinds of objects (such as Type and String) that are able to cross AppDomain boundaries, and if you lock on one of these domain-agile objects with DdMonitor from multiple domains, the static table maintained by DdMonitor in each domain will only contain a subset of the relevant information, again possibly leading to false negatives.
A third issue with this approach has to do with reliability. A lock statement in C# expands to a call to Monitor.Enter followed immediately by a try block, whose body is equivalent to the body of the lock statement and whose finally block releases the monitor. The CLR, through a bit of JIT hackery, ensures that the try block is entered if the call to Monitor.Enter succeeds; this ensures that the finally block will also be executed if Monitor.Enter succeeds, which is important in the face of asynchronous exceptions that could otherwise occur between the call to Monitor.Enter and entering the try block (for more information, see msdn.microsoft.com/msdnmag/issues/05/10/Reliability and www.bluebytesoftware.com/blog/2007/01/30/MonitorEnterThreadAbortsAndOrphanedLocks.aspx). As you'll see in our DdMonitor implementation, however, there is code that comes after the actual underlying call to Monitor.Enter and before the try block that comes after the call to DdMonitor.Enter; as such, these reliability guarantees are lost with our implementation.
You should keep all of these issues in mind in case any is dangerous to your scenarios.
Back to our implementation of TryEnter, which is shown in Figure 4; it continues by obtaining its internal lock for this shared state. Once acquired, it accesses the dictionary to find any previous data on the monitor being entered. If no such data exists, that means that no threads currently own or are waiting on the monitor. In such a case, TryEnter initializes a new MonitorState object to track this monitor and puts it back into the table.
Figure 4 Implementing TryEnter
private static Dictionary <object, MonitorState> _monitorStates =
new Dictionary <object, MonitorState> ();
public static bool TryEnter(object monitor, int millisecondsTimeout) {
if (monitor == null) throw new ArgumentNullException("monitor");
if (millisecondsTimeout < 0 && millisecondsTimeout != Timeout.Infinite)
throw new ArgumentOutOfRangeException("millisecondsTimeout");
bool thisThreadOwnsMonitor = false;
MonitorState ms = null;
try {
lock(_globalLock) {
if (!_monitorStates.TryGetValue(monitor, out ms)) {
_monitorStates[monitor] = ms = new MonitorState(monitor);
}
if (ms.OwningThread != Thread.CurrentThread) {
ms.WaitingThreads.Add(Thread.CurrentThread);
ThrowIfDeadlockDetected(ms);
}
}
thisThreadOwnsMonitor = Monitor.TryEnter(monitor, millisecondsTimeout);
} finally {
lock(_globalLock) {
if (ms != null) {
ms.WaitingThreads.Remove(Thread.CurrentThread);
if (thisThreadOwnsMonitor) {
if (ms.OwningThread != Thread.CurrentThread)
ms.OwningThread = Thread.CurrentThread;
else
ms.ReentranceCount++;
}
}
}
}
return thisThreadOwnsMonitor;
}
private class MonitorState {
public MonitorState(object monitor) {
MonitorObject = monitor;
}
public object MonitorObject;
public Thread OwningThread;
public int ReentranceCount;
public List < Thread > WaitingThreads = new List < Thread > ();
}
MonitorState stores several pieces of information about a monitor. First, it stores the monitor itself; the system uses this purely to provide diagnostic information in the event that a deadlock is detected (since it's useful for debugging to know which monitors were involved in a deadlock cycle). MonitorState also stores the Thread object for whichever thread currently owns the monitor (this value is null if it's not currently acquired), a ReentranceCount that indicates the number of times the monitor has been entered (since a thread that owns the monitor can reenter the monitor safely), and a list of waiting threads that indicates what threads are currently trying to acquire this monitor.
With a MonitorState tracking object for this monitor now in hand, TryEnter registers this thread's intent to wait on the object by adding the current Thread to the MonitorState's WaitingThreads list. Once this intent has been registered, TryEnter calls the private ThrowIfDeadlockDetected method, which does exactly what it sounds like (we'll take a look at this method's implementation shortly). If ThrowIfDeadlockDetected detects that this lock acquisition would cause a deadlock cycle, it throws an exception, preventing the lock from being acquired. Note, too, that in doing so, the finally block in which this call is wrapped will remove the current thread from the list of waiting threads.
If no deadlock is detected, it's safe for this thread to attempt to acquire the monitor, and TryEnter simply delegates to Moni- tor.TryEnter to enter the actual monitor. Regardless of whether the lock is acquired, the shared stated lock is then required. Note that I have my own lock leveling scheme here to prevent DdMonitor from causing deadlocks (which would be a really bad thing from a tool meant to prevent them). It's OK for _globalLock to be acquired while holding a lock on a user-supplied monitor, but it's not OK for the reverse to happen. That's why it's OK for me to take the lock on _globalLock in the finally block, where I may potentially already own the user-supplied monitor, but it's not OK to attempt to acquire a lock on the user-supplied monitor while holding _globalLock, which is why I first exit the lock on _globalLock before calling Monitor.TryEnter.
At this point, regardless of whether the user-supplied monitor was successfully acquired, I remove the current thread from the monitor's waiting list (it either owns the monitor or the wait failed, so in either case it's no longer waiting). Then, if the monitor was acquired, the relevant fields are updated in MonitorState.
Exiting the lock is more straightforward, as shown in Figure 5. With the exception of the argument validation, the entire body is wrapped in a lock on the _globalLock monitor. The MonitorState for the user-supplied monitor is retrieved (if it doesn't exist, this monitor was not locked and an exception is thrown), and it's checked to see which thread owns it. If it's unowned or if the owner is not the current thread, the call to Exit was erroneous and an exception is thrown. Otherwise, if the current thread has entered this monitor multiple times, the ReentranceCount is simply reduced, and if not, the OwningThread is set to null to signify the release of this thread's lock on the monitor. Additionally, if there are no threads waiting for this monitor, the MonitorState is removed from the table so that it can be garbage collected (if we failed to do this, the table could grow unbounded over time). Finally, the method delegates to Monitor.Exit to release the actual monitor.
Figure 5 Implementing Exit
public static void Exit(object monitor) {
if (monitor == null) throw new ArgumentNullException("monitor");
lock(_globalLock) {
MonitorState ms;
if (!_monitorStates.TryGetValue(monitor, out ms))
throw new SynchronizationLockException();
if (ms.OwningThread != Thread.CurrentThread) {
throw new SynchronizationLockException();
} else if (ms.ReentranceCount > 0) {
ms.ReentranceCount--;
} else {
ms.OwningThread = null;
if (ms.WaitingThreads.Count == 0)
monitorStates.Remove(monitor);
}
Monitor.Exit(monitor);
}
}
Now for the good part. We've seen the whole implementation, except for the actual deadlock detection algorithm, the core of which is implemented in ThrowIfDeadlockDetected (see Figure 6), which we saw called from DdMonitor.Enter earlier.
Figure 6 Detecting Cycles in Wait Chains
private static void ThrowIfDeadlockDetected(MonitorState targetMs) {
if (targetMs.OwningThread == null) return;
Dictionary <Thread, List <MonitorState>> locksHeldByThreads;
Dictionary <MonitorState, List <Thread>> threadsWaitingOnLocks;
CreateThreadAndLockTables(out locksHeldByThreads, out threadsWaitingOnLocks);
Queue <CycleComponentNode> threadsToFollow =
new Queue <CycleComponentNode> (locksHeldByThreads.Count);
threadsToFollow.Enqueue(new CycleComponentNode(Thread.CurrentThread,
targetMs, null));
while (threadsToFollow.Count > 0) {
CycleComponentNode currentChain = threadsToFollow.Dequeue();
Thread currentThread = currentChain.Thread;
List < MonitorState > locksHeldByThread;
if (!locksHeldByThreads.TryGetValue(currentThread, out locksHeldByThread))
continue;
foreach(MonitorState ms in locksHeldByThread) {
List < Thread > nextThreads;
if (!threadsWaitingOnLocks.TryGetValue(ms, out nextThreads)) continue;
foreach(Thread nextThread in nextThreads) {
if (currentChain.ContainsThread(nextThread))
throw new SynchronizationLockException(CreateDeadlockDescription(currentChain, locksHeldByThreads));
threadsToFollow.Enqueue(new CycleComponentNode(nextThread, ms,
currentChain));
}
}
}
}
The ThrowIfDeadlockDetected method accepts as a parameter the MonitorState instance that represents the monitor about to be waited on. For there to be a deadlock, this monitor (which the current thread is about to wait on, and for which we've already registered that intent) needs to be held by a thread that's directly or indirectly holding this monitor and which is waiting on a monitor held by the current thread. By "directly or indirectly holding," I mean that either it acquired a lock on the monitor or that it's waiting on a monitor held by a thread that's directly or indirectly holding this monitor (sorry, I like recursion). In other words, we're looking for a cycle from this monitor, to the thread that owns it, to a monitor that thread is waiting on, to the thread that owns that monitor, and so on, until we come back to the original monitor and thread that owns it. If we can find such a cycle, we throw an exception signaling deadlock.
The implementation follows that approach. First, we check to make sure the target MonitorState is actually owned by a thread; if it's not, there's no problem waiting on it. We then generate lookup tables based on all of the information we currently have about what locks own and are waiting on what monitors (we already have all of this information in the _monitorState table, so we just refactor that information into two Dictionary data structures that will make the rest of the algorithm easier to implement).
At this point, we need to start examining threads. We start with the current thread, which we know to be waiting on the supplied MonitorState. To capture this pairing, I create a simple helper container CycleComponentNode, which, in addition to storing this data, also maintains a reference to the next node in the identified cycle. When we start the algorithm, the only node we know about is the initial thread and MonitorState, so the Next value is null.
Until we have no more threads to be examined, the next thread to be examined (or, rather, the CycleComponentNode storing that thread) is retrieved from the queue. Using the tables we created earlier, we retrieve all of the locks held by this thread; if it doesn't hold any locks, it can't be part of a deadlock (remember the conditions necessary for a deadlock, specifically the one about holding one resource while requesting another). Each of these locks is examined, and again using the tables created earlier, we look up what threads are waiting on these locks. If any of those threads exists in the current wait chain we've built up (represented in list form by the CycleComponentNode), we just found our deadlock! The CreateDeadlockDescription method enumerates all of the CycleComponentNode instances in the chain and generates a description such as the following (which is then thrown in an exception):
Thread 24 waiting on lockP (39C82F0) while holding lockH (36BEF47), lockK (3B3D5E3)
Thread 30 waiting on lockH (36BEF47) while holding lockB (2CF7029), lockN (349C40A
Thread 29 waiting on lockB (2CF7029) while holding lockP (39C82F0)
If the thread doesn't exist in the current wait chain, we still need to examine it. So, we create a CycleComponentNode around it and the MonitorState it's waiting on, and its Next value is set to the current wait chain. That way, when we eventually examine this thread and all of the threads waiting on the monitors it holds, we can search the wait chain properly for those threads.
That's it for the solution. If it doesn't meet all of your needs, you can of course modify it as necessary. For example, when blocking, you could store into the MonitorState the stack trace of the current thread, and then when a deadlock is found, include the stack traces of all relevant threads in the thrown exception's message. Or you could instead augment the solution to support additional synchronization primitives, such as System.Threading.Mutex.
Also, you should note that if you're writing native code and locking on Windows synchronization primitives, you can take advantage of the new Wait Chain Traversal APIs to perform similar and more powerful analysis. For more information regarding this, see John Robbins' Bugslayer column in the July 2007 issue of MSDN® Magazine, available at msdn.microsoft.com/msdnmag/issues/07/07/Bugslayer.
Finally, a few more thoughts on the performance of this implementation. Deadlock detection code as I've implemented here is expensive, plain and simple. As such, you might want to use it in debug builds but not in retail builds. There are a few ways to accomplish this, but one of the simplest is to augment the implementation of DdMonitor with a static property:
private static bool _useDeadlockDetection = false;
public static bool UseDeadlockDetection {
get {
return _useDeadlockDetection;
}
set {
_useDeadlockDetection = value;
}
}
With that, you can augment the implementation of Enter, TryEnter, and Exit to check this property and to use it to determine whether to implement the full-blown deadlock detection algorithm or to simply delegate to the corresponding methods on Monitor:
public static void TryEnter(object monitor, int millisecondsTimeout) {
if (UseDeadlockDetection) {
...
// existing logic here
} else TryEnter(monitor, millisecondsTimeout);
}
To set UseDeadlockDetection in your code, you could read a value from a configuration file, or if you wanted to set the value based on the build configuration, you could use code like this:
public static void Main(string[] args) {
DdMonitor.UseDeadlockDetection =
#if DEBUG true;
#else false;
#endif
...
// the rest of your app here
}
Setting UseDeadlockDetection to false and using DdMonitor throughout your code base won't provide identical behavior and performance to just using Monitor, but it should be very close. Just make sure that you don't change the value of UseDeadlockDetection other than at the beginning of your program; if you do change it elsewhere, you may end up using the deadlock detection for some monitor acquisitions and not for others, the dangers of which we already discussed. An alternate approach suggested by Joe Duffy would be to change the implementation of TryEnter to first avoid the deadlock detection code and delegate to Monitor.TryEnter with some configurable timeout, say 500ms (or the user-supplied timeout if it was smaller). If that call succeeds, there's no need to perform deadlock detection as the monitor was successfully acquired. If that call times out and the timeout chosen was less than the user-supplied timeout, then the full-blown deadlock detection algorithm can be run. With this, deadlocks in deployed programs would still be caught, but you wouldn't pay the cost of the detection algorithm until the first Monitor acquisition timed out. And if you pick a reasonable timeout, then it's probably timing out because of a deadlock anyway.
Send your questions and comments for Stephen to netqa@microsoft.com.
Stephen Toub is a Senior Program Manager on the Parallel Computing Platform team at Microsoft. He is also a Contributing Editor for MSDN Magazine.