Concurrency Hazards

Solving 11 Likely Problems In Your Multithreaded Code

Joe Duffy

Concurrency is everywhere. Server-side programs have long had to deal with a fundamentally concurrent programming model, and as multicore processors become more commonplace, client-side programs will have to as well. Along with the addition of concurrency comes the responsibility for ensuring safety. In other words, programs must continue to achieve the same level of robustness and reliability in the face of large amounts of logical concurrency and ever-changing degrees of physical hardware parallelism.

Correctly engineered concurrent code must live by an extra set of rules when compared to its sequential counterpart. Reads and writes from memory and access to shared resources need to be regulated using synchronization so that conflicts do not arise. Additionally, it often becomes necessary for threads to coordinate to get a certain job done.

As a direct result of these additional requirements, it becomes nontrivial to ensure that threads are continually making consistent and adequate forward progress. Synchronization and coordination deeply depend on timing, which is nondeterministic and hard to predict and test for.

What makes these attributes difficult is simply that it requires a mindset shift. There isn't a single API that you can learn or a snippet of code to copy and paste. There really is a fundamental set of concepts that you need to learn and become comfortable with. It's likely that certain languages and libraries can hide some concepts over time, but if you're doing concurrency today, you won't have that luxury. This article describes some of the more common challenges to be aware of and presents advice for coping with them in your software.

I'll start by looking at the kinds of things that often go wrong in concurrent programs. I call these "safety hazards" because they are so easy to stumble upon and because their consequences are typically quite undesirable. These hazards cause your programs to break by either crashing or corrupting memory.

A data race—or race condition—occurs when data is accessed concurrently from multiple threads. Specifically, it happens when one or more threads are writing a piece of data while one or more threads are also reading that piece of data. This problem arises because Windows programs (in C++ and the Microsoft .NET Framework alike) are fundamentally based on the concept of shared memory, where all threads in a process may access data residing in the same virtual address space. Static variables and heap allocation can be used to share.

Consider this canonical example:

//C#

static class Counter {
    internal static int s_curr = 0;
    internal static int GetNext() { 
        return s_curr++; 
    }
}

‘Visual Basic

Friend Class Counter
        Friend Shared s_curr As Integer = 0
        Friend Shared Function GetNext() As Integer
            Return s_curr = s_curr + 1
        End Function
End Class

The goal of Counter is presumably to hand out a new unique number for each call to GetNext. If two threads in the program both call GetNext simultaneously, however, two threads might be given the same number. The reason is that s_curr++ compiles into three separate steps:

  1. Read the current value from the shared s_curr variable into a processor register.
  2. Increment that register.
  3. Write the register value back to the shared s_curr variable.

Two threads executing this same sequence can both read the same value from s_curr locally (say, 42), increment it (to, say, 43), and publish the same resulting value. GetNext thus returns the same number for both threads, breaking the algorithm. Although the simple statement s_curr++ appears to be atomic, this couldn't be farther from the truth.

Forgotten Synchronization

This is the simplest kind of data race: synchronization is forgotten altogether. Races of this kind can be, in rare situations, benign, meaning that they are correct, but most are fundamental problems of correctness.

Problems like this aren't always so obvious. For example, an object may be part of some large complicated object graph that happens to be reachable from a static variable or became shared by passing an object as part of a closure when creating a new thread or queuing work to a thread pool.

It is imperative to pay attention to when an object (graph) trans­itions from private to shared. This is called publication and will be discussed later in the context of isolation. The reverse is called privatization—that is, when an object (graph) goes from being shared to private again.

The solution is to add proper synchronization. In the example of the counter, I could use a simple interlocked:

//C#

static class Counter {
    internal static volatile int s_curr = 0;
    internal static int GetNext() { 
        return Interlocked.Increment(ref s_curr); 
    }
}

‘Visual Basic
Friend Class Counter
    Friend Shared s_curr As Integer = 0
    Friend Shared Function GetNext() As Integer
        Monitor.Enter(s_curr)
        Try
            Return Interlocked.Increment(s_curr)
        Finally
            Monitor.Exit(s_curr)
        End Try
    End Function
End Class

This works because the update is confined to a single memory location, and because (conveniently) there is a hardware instruction (LOCK INC) that is equivalent to the software statement I'm trying to make atomic.

Alternatively, I could use a full-fledged lock:

//C#

static class Counter {
    internal static int s_curr = 0;
    private static object s_currLock = new object();
    internal static int GetNext() {
        lock (s_currLock) { 
            return s_curr++; 
        }
    }
}

‘Visual Basic

Friend Class Counter
    Friend Shared s_curr As Integer = 0
    Private Shared s_currLock As New Object()
    Friend Shared Function GetNext() As Integer
        SyncLock s_currLock
            Return s_curr = s_curr + 1
        End SyncLock
    End Function
End Class

The lock statement ensures mutual exclusion among all threads trying to access GetNext, and uses the CLR System.Threading.Monitor class. C++ programs would use a CRITICAL_SECTION for the same purpose. There is no need to go with a lock for this particular example, but when multiple operations are involved it's seldom possible to fold them into a single interlocked operation.

Incorrect Granularity

Even if access to shared state occurs with proper synchronization, the resulting behavior may still be incorrect. The granularity must be sufficiently large that the operations that must be seen as atomic are encapsulated within the region. There is a tension between correctness and making the region smaller, because smaller regions reduce the time that other threads will have to wait to concurrently enter.

For example, take a look at the bank account abstraction shown in Figure 1. All is well, and the object's two methods, Deposit and Withdraw, appear to be concurrency-safe. Some banking application can use them without worry that balances will become corrupt due to concurrent access.

Figure 1 A Bank Account

//C#

class BankAccount {
    private decimal m_balance = 0.0M;
    private object m_balanceLock = new object();
    internal void Deposit(decimal delta) {
        lock (m_balanceLock) { m_balance += delta; }
    }
    internal void Withdraw(decimal delta) {
        lock (m_balanceLock) {
            if (m_balance < delta)
                throw new Exception("Insufficient funds");
            m_balance -= delta;
        }
    }
}

‘Visual Basic

Friend Class BankAccount
    Private m_balance As Decimal = 0D
    Private m_balanceLock As New Object()
    Friend Sub Deposit(ByVal delta As Decimal)
        SyncLock m_balanceLock
            m_balance += delta
        End SyncLock
    End Sub
    Friend Sub Withdraw(ByVal delta As Decimal)
        SyncLock m_balanceLock
            If m_balance < delta Then
                Throw New Exception("Insufficient funds")
            End If
            m_balance -= delta
        End SyncLock
    End Sub
End Class

What if, however, you'd like to add a Transfer method? A naive (and incorrect) attempt would assume that, because Deposit and Withdraw are safe in isolation, they can be combined easily:

//C#

class BankAccount {
    internal static void Transfer(
      BankAccount a, BankAccount b, decimal delta) {
        Withdraw(a, delta);
        Deposit(b, delta);
    }
    // As before 
}

‘Visual Basic

Friend Class BankAccount
    Friend Shared Sub Transfer(ByVal a As BankAccount, ByVal b As BankAccount, ByVal delta As Decimal)
        Withdraw(a, delta)
        Deposit(b, delta)
    End Sub
    ' As before 
End Class

This is incorrect. There is actually a period of time between the Withdraw and Deposit calls where the money is missing completely.

A correct implementation would need to acquire the locks on both a and b up front and then make the method calls:

//C#

class BankAccount {
    internal static void Transfer(
      BankAccount a, BankAccount b, decimal delta) {
        lock (a.m_balanceLock) {
            lock (b.m_balanceLock) {
                Withdraw(a, delta);
                Deposit(b, delta);
            }
        }
    }
    // As before 
}

‘Visual Basic

Friend Class BankAccount
    Friend Shared Sub Transfer(ByVal a As BankAccount, ByVal b As BankAccount, ByVal delta As Decimal)
        SyncLock a.m_balanceLock
            SyncLock b.m_balanceLock
                Withdraw(a, delta)
                Deposit(b, delta)
            End SyncLock
        End SyncLock
    End Sub
    ' As before 
End Class

It turns out that this approach, while solving the granularity problem, is prone to deadlock. You'll see how to fix it later on.

Read and Write Tearing

As mentioned earlier, benign races allow you to access variables without synchronization. Reads and writes of aligned, naturally sized words—for example, pointer-sized things that are 32 bits (4 bytes) on 32-bit processors and 64 bits (8 bytes) on 64-bit processors—are atomic. If a thread is just reading a single variable that some other thread will write, and there aren't any complex invariants involved, you can sometimes skip the synchronization altogether thanks to this guarantee.

But beware. If you try this on a misaligned memory location, or a location that isn't naturally sized, you can encounter a read or write tearing. Tearing occurs because reading or writing such locations actually involves multiple physical memory operations. Concurrent updates can happen in between these, potentially causing the resultant value to be some blend of the before and after values.

As an example, imagine ThreadA sits in a loop and writes only 0x0L and 0xaaaabbbbccccddddL to a 64-bit variable s_x. ThreadB sits in a loop reading it (see Figure 2).

 Figure 2 Tearing about to Happen

//C#

internal static volatile long s_x;
void ThreadA() {
    int i = 0;
    while (true) {
        s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
        i++;
    }
}
void ThreadB() {
    while (true) {
        long x = s_x;
        Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
    }
}

You may be surprised to discover that ThreadB's assert might fire. The reason is that ThreadA's write will consist of two parts, the high 32-bits and the low 32-bits, in some order depending on the compiler. The same goes for ThreadB's read. Thus ThreadB could witness values 0xaaaabbbb00000000L or 0x00000000aaaabbbbL.

Lock-Free Reordering

Sometimes it's tempting to write lock-free code as a way of achieving better scalability and reliability. Doing so requires a deep understanding of the memory model of your target platform (see Vance Morrison's article "Memory Models: Understand the Impact of Low-Lock Techniques in Multithreaded Apps" at msdn.microsoft.com/magazine/cc163715 for details). Failure to understand and heed these rules can lead to memory reordering bugs. These happen because compilers and processors are free to reorder memory operations during the process or making optimizations.

As an example, assume s_x and s_y are both initialized to the value 0, as you see here:

//C#

internal static volatile int s_x = 0;
internal static volatile int s_xa = 0;
internal static volatile int s_y = 0;
internal static volatile int s_ya = 0;

void ThreadA() {
    s_x = 1;
    s_ya = s_y;
}

void ThreadB() {
    s_y = 1;
    s_xa = s_x;
}

‘Visual Basic

Friend Shared s_x As Integer = 0
Friend Shared s_xa As Integer = 0
Friend Shared s_y As Integer = 0
Friend Shared s_ya As Integer = 0

Private Sub ThreadA()
    s_x = 1;
    Thread.MemoryBarrier();
    s_ya = s_y;
End Sub

Private Sub ThreadA()
    s_y = 1;
    Thread.MemoryBarrier();
    s_xa = s_x;
End Sub

Is it possible that, after ThreadA and ThreadB both run to completion, both s_ya and s_xa will contain the value 0? This seems ridiculous on its face. Either s_x = 1 or s_y = 1 will happen first, in which case the other thread will witness the update when it gets around to its update. Or so the theory goes.

Unfortunately, processors are free to reorder this code so that the loads effectively happen before the writes. You can get around this with an explicit memory barrier:

//C#

void ThreadA() {
    s_x = 1;
    Thread.MemoryBarrier();
    s_ya = s_y;
}

‘Visual Basic

Private Sub ThreadA()
     s_x = 1
     Thread.MemoryBarrier()
     s_ya = s_y
End Sub

The .NET Framework offers this particular API, and C++ offers _MemoryBarrier and similar macros. But the moral of the story is not that you should insert memory barriers all over the place. The moral is that you should steer clear of lock-free code until you've mastered memory models, and even then tread with care.

In Windows, including Win32 and the .NET Framework, most locks support recursive acquires. That just means that, if the current thread already holds a lock and tries to acquire it again, the request will be satisfied. This makes it easier to compose bigger atomic operations out of smaller ones. In fact, the BankAccount example shown earlier depended on recursive acquires: Transfer called both Withdraw and Deposit, each of which redundantly acquired a lock that Transfer had already acquired.

This can be the source of problems, however, if you end up in a situation where a recursive acquire occurs when you didn't expect it to. This can happen as a result of reentrancy, which is possible due to either explicit callouts to dynamic code (such as virtual methods and delegates) or because of implicitly reentered code (such as STA message pumping and asynchronous procedure calls). For this reason, among others, it's a good idea to never make a call to a dynamic method from within a lock region.

For example, imagine some method that breaks invariants temporarily and then calls a delegate:

//C#

class C {
    private int m_x = 0;
    private object m_xLock = new object();
private Action m_action = ...;

    internal void M() {
        lock (m_xLock) {
            m_x++;
            try { m_action(); }
            finally {
                Debug.Assert(m_x == 1);
                m_x--;
            }
        }
    }
}

‘Visual Basic

Friend Class C
    Private m_x As Integer = 0
    Private m_xLock As New Object()
    Private m_action As Action =...

    Friend Sub M()
        SyncLock m_xLock
            m_x += 1
            Try
                m_action()
            Finally
                Debug.Assert(m_x = 1)
                m_x -= 1
            End Try
        End SyncLock
    End Sub
End Class

C's method M ensures that m_x does not change. But there is a brief period of time where m_x is incremented by one before being decremented again. The callout to m_action looks innocent enough. Unfortunately, if it was a delegate accepted from users of the C class, it represents arbitrary code that can do anything it pleases. That includes calling back into the same instance's M method. And if that happens, the assert within the finally is apt to fire; there could be multiple calls to M active on the same stack, even though you didn't do it directly, which clearly can lead to m_x holding a value greater than 1.

When multiple threads encounter a deadlock, the system simply stops responding. Several MSDN Magazinearticles have described how deadlocks occur and some ways of tolerating them—including my own article, "No More Hangs: Advanced Techniques to Avoid and Detect Deadlocks in .NET Apps," at msdn.microsoft.com/magazine/cc163618, and Stephen Toub's October 2007 .NET Matters column at msdn.microsoft.com/magazine/cc163352—so the discussion here will be kept light. In summary, whenever a circular wait chain is created—such that some ThreadA is waiting for a resource held by ThreadB, which is also reflexively waiting for a resource held by ThreadA (perhaps indirectly by waiting for a third ThreadC or more)—it is possible for all forward progress to come to a grinding halt.

A common source of this problem is with mutually exclusive locks. In fact, the BankAccount example shown earlier suffers from this problem. If ThreadA tries to transfer $500 from account #1234 to account #5678 at the same time ThreadB tries to transfer $500 from #5678 to #1234, the code can deadlock.

Using a consistent acquisition order can avoid the deadlock, as shown in Figure 3. This logic can be generalized to something called simultaneous lock acquisition, whereby multiple lockable objects are sorted dynamically according to some ordering among locks, so that any place two locks must be held at the same time they are acquired in a consistent order. Another scheme called lock leveling can be used to reject lock acquisitions that are provably done in an inconsistent order.

Figure 3 Consistent Acquisition Order

//C#

class BankAccount {
    private int m_id; // Unique bank account ID.
    internal static void Transfer(
      BankAccount a, BankAccount b, decimal delta) {
        if (a.m_id < b.m_id) {
            Monitor.Enter(a.m_balanceLock); // A first
            Monitor.Enter(b.m_balanceLock); // ...and then B
        } else {
            Monitor.Enter(b.m_balanceLock); // B first
            Monitor.Enter(a.m_balanceLock); // ...and then A 
        }
        try {
            Withdraw(a, delta);
            Deposit(b, delta);
        } finally {
            Monitor.Exit(a.m_balanceLock);
            Monitor.Exit(b.m_balanceLock);
        }
    }
    // As before ...
}

‘Visual Basic

Friend Class BankAccount
    Private m_id As Integer ' Unique bank account ID.
    Friend Shared Sub Transfer(ByVal a As BankAccount, ByVal b As BankAccount, ByVal delta As Decimal)
        If a.m_id < b.m_id Then
            Monitor.Enter(a.m_balanceLock) ' A first
            Monitor.Enter(b.m_balanceLock) '...and then B
        Else
            Monitor.Enter(b.m_balanceLock) ' B first
            Monitor.Enter(a.m_balanceLock) '...and then A
        End If
        Try
            Withdraw(a, delta)
            Deposit(b, delta)
        Finally
            Monitor.Exit(a.m_balanceLock)
            Monitor.Exit(b.m_balanceLock)
        End Try
    End Sub
    ' As before ...
End Class

But locks are not the only source of deadlock. Missed wake-ups are another phenomenon, where some event is missed and a thread sleeps forever. This is common with synchronization events like Win32 auto-reset and manual-reset events, CONDITION_VARIABLEs, and CLR Monitor.Wait, Pulse, and PulseAll calls. A missed wake-up is usually a sign of improper synchronization, failure to retest a wait condition, or use of a wake-single primitive (WakeConditionVariable or Monitor.Pulse) when a wake-all (WakeAllConditionVariable or Monitor.PulseAll) would have been more appropriate.

Another common source of this problem is lost signals with auto- and manual-reset events. Because such an event can only be in one state—signaled or nonsignaled—redundant calls to set the event will effectively be ignored. If code assumes that two calls to set always translates to two threads awakened, the result can be a missed wake-up.

Lock Convoys

When the arrival rate at a lock is consistently high compared to its lock acquisition rate, a lock convoy may result. In the extreme, there are more threads waiting at a lock than can be serviced, leading to a catastrophe. This is more common on server-side programs where certain locks protecting data structures needed by most clients can get unusually hot.

For example, imagine this scenario: On average, eight requests arrive per 100 milliseconds. We use eight threads to service requests (because we're on an 8-CPU machine). Each of those eight threads must acquire a lock and hold it for 20 milliseconds before it can do meaningful work.

Unfortunately, access to this lock must be serialized, so it takes 160 milliseconds for all eight threads to enter and leave the lock. After the first exists, there will be 140 milliseconds of time before a ninth thread can access the lock. This scheme inherently will not scale, and there will be a continuously growing backup of requests. Over time, if the arrival rate does not decrease, client requests are apt to begin timing out, and a disaster will ensue.

Fairness in locks is known to contribute to convoys. The reason is that periods of time where the lock could have been made available are artificially blocked out, so that threads arriving must wait until the chosen lock owner thread can wake up, context switch, and acquire and release the lock. Windows has changed all of its internal locks to be unfair over time in order to combat this problem, and CLR monitors are also unfair.

The only real solution to the fundamental convoy problem is to reduce lock hold times and to factor your system so that there are very few (if any) hot locks. This is easier said than done, but is crucial for scalability.

A stampede is a situation where many threads are awakened, such that they all fight for attention simultaneously from the Windows thread scheduler. If you have 100 threads blocked on a single manual-reset event, for example, and you set that event … well, you're apt to have a real mess on your hands, particularly if a large portion of those threads will have to wait again.

One way to implement a blocking queue is to use a manual-­reset event that gets unsignaled when the queue is empty, and becomes signaled when it is non-empty. Unfortunately, when there are a large number of waiting threads during the transition from zero elements to one element, a stampede can occur. That's because only one thread will take the single element, which transitions the queue back to empty, and necessarily involves resetting the event. If you had 100 threads waiting, then 99 of them will wake up, context switch (and incur all those cache misses), just to realize they have to wait again.

Two-Step Dance

Sometimes you need to signal an event while holding a lock. This can be unfortunate if the waking thread needs to acquire the lock held, because it will be awakened only to find out that it must wait again. This is wasteful and increases the number of overall context switches. This situation is called the two-step dance, and can extend far beyond just two steps if many locks and events are involved.

Both Win32 and the CLR's condition variable support inherently suffers from the two-step dance problem. It is often unavoidable, or prohibitively difficult to work around.

The two-step dance issue is even worse on a single-processor machine. When events are involved, the kernel will apply a priority boost to the awakening thread. This is nearly guaranteed to preempt the thread setting the event before it has a chance to release the lock. This is two-step dance in the extreme, where the setting ThreadA is context switched out so that the awakening ThreadB can attempt to acquire the lock; it of course cannot, and so it will context switch out so that ThreadA can run again; eventually, ThreadA will release the lock, which again will priority boost ThreadB, which preempts ThreadA so that it can run. As you can see, there's a lot of wasteful context switching going on.

Priority Inversion

Modifying thread priorities is usually asking for trouble. Priority inversion can arise when many threads at different priorities share access to the same locks and resources, whereby a lower-priority thread actually indefinitely holds up the progress of a higher-priority thread. The moral of this story is to simply avoid changing thread priorities wherever possible.

Here is an extreme example of priority inversion. Imagine ThreadA at Low priority acquires some lock L. Then comes along ThreadB, which is at High priority. It tries to acquire L, but cannot because ThreadA holds it. This is the "inversion" part: it's as if ThreadA is artificially and temporarily given a higher priority than ThreadB, simply because it holds a lock that ThreadB needs.

The situation will eventually resolve itself when ThreadA releases the lock. Unfortunately, imagine what happens if ThreadC at Medium priority comes into the picture. Although ThreadC doesn't need lock L, its mere presence may prevent ThreadA from running at all, which indirectly prevents the High-priority ThreadB from running.

Eventually the Windows Balance Set Manager thread will notice this situation. Even if ThreadC remains runnable forever, ThreadA will eventually (after four seconds) receive a temporary priority boost by the OS. Hopefully this is enough to allow it to run to completion and release the lock. But the delay here (four seconds) is huge, and if there is any user interface involved, the application's user is certainly going to notice the problem.

Patterns for Achieving Safety

Now that I've chased down issue after issue, the good news is that there are design patterns you can follow to make the occurrence of many of the above issues (particularly the correctness hazards) far less frequent. The crux of most issues is that state is shared among multiple threads. What's worse, this state is freely manipulated, and it transitions from a consistent state to an inconsistent one and then (hopefully) back again with surprising regularity.

As developers write code for single-threaded programs, all of this makes sense. It's easy to use shared memory as a sort of scratch pad as you make your way towards a valid and final destination. C-style imperative programming languages have worked this way for quite a few years.

But with the rise of more and more concurrency, you need to pay closer attention to these habits. And you can take a cue from functional programming languages like Haskell, LISP, Scheme, ML, and even F# (a new .NET-compliant language), by adopting immutability, purity, and isolation as first class design concepts.

Immutability

An immutable data structure is one that doesn't change after construction. This is a wonderful property for concurrent programs because if data isn't being mutated, then there is no risk of conflict if many threads access it at once. That means synchronization isn't a concern.

Immutability has some support in C++ by way of const, and in C# by way of the read-only modifier. For example, a .NET type that has only read-only fields is shallow immutable. F# creates shallow immutable types by default, unless you use the mutable modifier. Going a step further, if each of those fields itself refers to another type whose fields are all read-only (and only refers to deeply immutable types), then the type is deeply immutable. This results in an entire object graph that is guaranteed to not change out from underneath you, which is very useful indeed.

All of this describes immutability as a static property. It's also possible for objects to be immutable by convention, meaning that there is some guarantee that state won't change during certain periods of time. This is a dynamic property. The Windows Presentation Foundation (WPF) freezable feature implements precisely this, and it also allows concurrent access without synchronization (although it cannot be checked in the same way that static support can). Dynamic immutability is often useful for objects that transition between immutable and mutable throughout the duration of their lifetime.

Immutability does have some downsides. Namely, whenever something has to change, you will need to make a copy of the original object and apply the changes along the way. Additionally, cycles in an object graph are generally not possible (except with dynamic immutability).

For instance, imagine you have an ImmutableStack<T>, as shown in Figure 4. Instead of a set of mutating Push and Pop methods, you will need to return new ImmutableStack<T> objects from them that contain the applied changes. Clever tricks can be used (as with the stack) in some cases to share memory among instances.

Figure 4 Using ImmutableStack

public class ImmutableStack<T> {
    private readonly T m_value;
    private readonly ImmutableStack<T> m_next;
    private readonly bool m_empty;
    public ImmutableStack() { m_empty = true; }
    internal ImmutableStack(T value, Node next) {
        m_value = value;
        m_next = next;
        m_empty = false;
    }
    public ImmutableStack<T> Push(T value) {
        return new ImmutableStack(value, this);
    }
    public ImmutableStack<T> Pop(out T value) {
        if (m_empty) throw new Exception("Empty.");
        return m_next;
    }
}

‘Visual Basic

Public Class ImmutableStack(Of T)
    Private ReadOnly m_value As T
    Private ReadOnly m_next As ImmutableStack(Of T)
    Private ReadOnly m_empty As Boolean
    Public Sub New()
        m_empty = True
    End Sub
    Friend Sub New(ByVal value As T, ByVal [next] As Node)
        m_value = value
        m_next = [next]
        m_empty = False
    End Sub
    Public Function Push(ByVal value As T) As ImmutableStack(Of T)
        Return New ImmutableStack(value, Me)
    End Function
    Public Function Pop(<System.Runtime.InteropServices.Out()> ByRef value As T) As ImmutableStack(Of T)
        If m_empty Then
            Throw New Exception("Empty.")
        End If
        Return m_next
    End Function
End Class

As nodes are pushed, you have to allocate a new object for each one. In a standard linked-list implementation of a stack, you'd have to do this anyway. Notice that as you pop elements from the stack, however, you can use the existing objects. That's because each node in the stack is immutable.

There are already immutable types running rampant in the wild. The CLR's System.String class is immutable, and there is a design guideline that all new value types should be immutable, too. The guidance being given here is to use immutability where is it possible and feels natural, and to resist the temptation to perform mutation simply because it is made convenient by the current generation of languages.

Purity

Even with immutable data types, most operations performed by a program are method calls. And method calls can have side-effects, which are problematic in concurrent code because a side-effect implies mutation of some form. Most often this will be a simple write to shared memory, but it can also be a physically mutating operation like a database transaction, Web service invocation, or file system operation. In many circumstances, I'd like to be able to invoke some method without fear that it will lead to concurrency hazards. Good examples of this are simple methods like GetHashCode and ToString on System.Object. Most people wouldn't expect them to entail side-effects.

A pure method can always be run in a concurrent setting without added synchronization. Although there is no common language support for purity, you can define a pure method very simply:

  1. It only reads from shared memory, and it only reads
  2. It can, of course, write to local variables.
  3. It may only call other pure methods.
  4. It only reads from shared memory, and it only reads immutable or constant state.
  5. It can, of course, write to local variables.
  6. It may only call other pure methods.

The set of things possible within a pure method, thus, is extremely limited. But when combined with immutable types, purity can be made possible and convenient. Some functional languages assume purity by default, most notably Haskell where everything is pure. Anything that must perform a side-effect has to be wrapped in a special thing called a monad. Most of us don't use Haskell, however, so we'll have to get by with purity-by-convention.

Isolation

Publication and privatization were mentioned in passing earlier, but they strike at the heart of a very important issue. Synchronization is necessary—and immutability and purity are interesting—because state is ordinarily shared among multiple threads. But if state is confined within a single thread, then there is no need for synchronization. This leads to more naturally scalable software.

Indeed, if state is isolated, mutation can happen freely. This is convenient because mutation is a fundamental built-in aspect of most C-style languages. Programmers are accustomed to it. It takes discipline to program in a mostly functional style, and is quite difficult for most developers. So give it a try, but don't deceive yourself into thinking the world will become functional overnight.

Ownership is a difficult thing to track down. When does an object become shared? When it is initialized, that is being done by a single thread and the object itself is not reachable from other threads just yet. Once a reference to that object is stored in a static variable, someplace that has been shared at thread creation or queuing time, or a field of an object transitively reachable from one of these places, the object becomes shared. It is imperative that developers watch specifically for these transitions between private and shared, and treat all shared state with care.


Joe Duffy is the Development Lead for Parallel Extensions to .NET at Microsoft. He spends most of his time hacking code, overseeing the design of the library, and managing an amazing team of developers. His new book is Concurrent Programming on Windows.