Read-only and threadsafe are different

Artikel
05/23/2011

Here's a common problem that we face in the compiler realm all the time: you want to make an efficient immutable lookup table for mapping names to "symbols". This is in a sense the primary problem that the compiler has to solve; someone says "x = y + z;" and we have to figure out what "x", "y" and "z" mean before we can do any more analysis. An obvious way to do that is to figure out all the name-to-symbol mappings for a particular declaration space once, ahead of time, stuff the results into a lookup table, and then use that lookup table during the analysis.

The lookup table can be immutable because once it is built, it's built; no more symbols are going to be added to it. It's going to be used solely as a lookup mechanism. A common and cheap way of doing that is to use what I whimsically call "popsicle" immutability: the data structure is a fully mutable structure until you freeze it, at which point it becomes an error to attempt to change it. This technique differs markedly from the sort of "persistently re-usable data structure" immutability we've talked about in the past, where you want to reuse existing bits of the data structure as you build new, different data structures out of it.

For example, we might write up a really simple little hash table that supports "freezing". Something like this:

abstract class Symbol
{
public string Name { get; protected set; }
}

sealed class SymbolTable
{
private bool frozen = false;

private class BucketListNode
{
public Symbol Symbol { get; set; }
public BucketListNode Next { get; set; }
public BucketListNode Prev { get; set; }
}

private BucketListNode[] buckets = new BucketListNode[17];

private int GetBucket(string s)
{
return Math.Abs(s.GetHashCode()) % buckets.Length;
}

public void Add(Symbol symbol)
{
// Omitted: error checking for null symbol, null name, duplicated entry, and so on.
if (this.frozen)
throw new InvalidOperationException();
int bucket = GetBucket(symbol.Name);
var node = new BucketListNode();
node.Symbol = symbol;
node.Prev = null;
node.Next = buckets[bucket];
if (node.Next != null)
node.Next.Prev = node;
buckets[bucket] = node;
}

public void Freeze()
{
this.frozen = true;
}

public Symbol Lookup(string name)
{
// Omitted: error checking
int bucket = GetBucket(name);
for (var node = buckets[bucket]; node != null; node = node.Next)
{
if (node.Symbol.Name == name)
return node.Symbol;
}
return null;
}
}

Very simple. We have an array of seventeen doubly-linked lists. We balance the table by choosing which of the seventeen linked lists to go with by hashing the name of the symbol. That way our lookup cost is one-seventeeth the cost of doing a straight lookup in a complete list.

Now of course there are other ways to do this efficiently. We could be sorting the list of symbols and doing a binary search. We could be re-hashing into a larger array of bucket lists if it turns out that there are thousands of symbols in this table. For the sake of argument, let's suppose that we've decided to go with a fixed-number-of-buckets hashing approach, as we've done here.

One of the much-touted benefits of immutable data structures is that they are "threadsafe". Since they cannot be written to, you'll never run into a situation where a write operation is interrupted halfway by a thread switch, causing another thread to read an inconsistent data structure. However, it is a fallacy to believe that just because a data structure does not admit any way for you change its contents, its implementation must be threadsafe!

Suppose for example we do a real-world performance analysis of our little table here and discover that in practice, this sort of pattern comes up a lot in our customers' code:

x.Foo();
x.Bar();
x.Blah();
y.Abc();
y.Def();
y.Ghi();

That is, the same identifier is looked up multiple times in a row in the context of a particular symbol table. We might decide to add a little optimization to our table's lookup method:

for (var node = buckets[bucket]; node != null; node = node.Next)
{
if (node.Symbol.Name == name)
{
MoveToFront(bucket, node);
return node.Symbol;
}
}
...
private void MoveToFront(int bucket, BucketListNode node)
{
if (buckets[bucket] == node)
return;
node.Prev.Next = node.Next;
if (node.Next != null)
node.Next.Prev = node.Prev;
node.Prev = null;
node.Next = buckets[bucket];
node.Next.Prev = node;
buckets[bucket] = node;
}

We have essentially tweaked our hash table to have an array of Most Recently Used Lists, rather than merely an array of Doubly Linked Lists. That might be a performance win in the case where the lists get long and the same identifiers tend to be looked up in clusters. (It is enough of a real-world win that in the VBScript engine, the lookup tables use a variant of this optimization.)

But you see what we've done here? Reading the table now sometimes mutates its interior structure, and that mutation is not threadsafe! Even though there is no way for the user to logically change a frozen table, we do not ensure that reading the data is threadsafe.

In practice, most data structures do not do mutations on reads, but you cannot rely upon that unless it is documented. For example, the documentation for the Dictionary class notes that a Dictionary is threadsafe for multiple readers so long as there are no writers (though of course the actual enumerators are not threadsafe objects, as they are constantly mutating; rather, the collection being enumerated is threadsafe). Unless the author of a particular object makes a guarantee to you like that, you should assume that none of the operations are threadsafe, even if they appear to be read-only.

Comments

Anonymous
May 19, 2011
Since you bring up immutability & thread safety, hopefully you can answer something I have not been able to get a solit answer on...If one creates a struct, made up of only readonly fields that are either primitive value types or other structs made of of readonly fields..... Is the resulting construct atomicly threadsafe? Consider a global instance that is being wrriten two "GlobalInstance = SomeOnherInstance;" This will involve copying the bytes from the one strucutre instance to the other. If at the same time another thread (especially in a multi-proc/multi-core) does "localInstance=GlobalInstance"....I believe it is possible that localInstance could contain part of the old data and part of the new... Good question. Putting readonly fields in a struct does not magically make the processor bus in every machine wider. If the processor can only move 32 bits at one go then the processor can only move 32 bits at one go; if you have more bits than that to move, then they'll clearly be non-atomic operations. Many people are unclear on the relationships amongst thread safe, readonly, volatile, atomic, and so on. I'll talk more about this in the next few episodes. -- Eric
Anonymous
May 23, 2011
I'm going to assume that my first comment was eaten, since I got no feedback at all, in contrast to my followup query that got a nice "may take a few minutes" message. I recently encountered a similar problem. As far as I can tell from the documentation, MulticastDelegate makes no claim to be thread-safe. It seems that you can't safely do anything with the same instance of a delegate. Even though removing one delegate from another creates a fresh new delegate, it seems there's no guarantee that you can perform this operation if another thread might be touching the same instance of the delegate, since as you suggest they might have hidden mutable state. Is this a correct reading? It seems quite an onerous requirement to say that it's never safe to use the same delegate instance from different threads without locking. I believe Mono's implementation of MulticastDelegate actually does have hidden mutable state. I'm not entirely sure when it is modified. It might be that it's used "popsicle" style, and never mutated once a delegate has been exposed to client code.
Anonymous
May 23, 2011
This is the difference between logical immutability and physical immutability. If an object has physical immutability it is thread-safe AND it is logically immutable as well. If the object has logical immutability it's not necessarily thread-safe. Care has to be taken to make the physical mutations thread safe.
Anonymous
May 23, 2011
Eric, Thanks for the confirmation [of non-atomicity for structs bigger than the processor bus... I just was not sure if there was some type of synchronization burried deep inside the CLR with respect to struct assignments. Without it, immutable reference types are actually easier to treat atomically since the assignment of a new instance to a variable is alwas an atomic action.
Anonymous
May 23, 2011
Not sure if my comment was eaten, too. It revolved around wondering if there's value in having the compiler guaranteeing the thread safety of something.
Anonymous
May 23, 2011
C++ community uses beautiful metaphor for this case: physical and logical constantness. Calling constant member function does not mean that internal object state did not change. It’s still possible that this constant member function uses mutable fields for caching and therefore changed internal (physical) object state. And here we have the same situation: immutable objects could be logically and physically immutable. But only physically immutable reference types are thread safe.
Anonymous
May 23, 2011
@Sergey: "But only physically immutable reference types are thread safe." ... are automatically threadsafe, I'm sure you meant to say. Any object, mutable or otherwise, might be threadsafe - if it's been specifically designed as such.
Anonymous
May 23, 2011
@CarlD: yes, of course.
Anonymous
May 23, 2011
hhmm.... it's not read-only if it writes when you read from it
Anonymous
May 23, 2011
The comment has been removed
Anonymous
May 23, 2011
Eber: If I can defragment a read-only file, does that mean my file isn't really read-only?
Anonymous
May 24, 2011
is of much interest .. many junior developer has much use of VOLATILE thinking is that it puts "thread safety" like the magic i try to explain but is all a loss .. i will forward articel
Anonymous
May 24, 2011
Gabe: If the garbage collector moves a read-only object, does it mean my object isn't really read-only? Etc.
Anonymous
May 24, 2011
C# needs the const = (expression) syntax found in C++ to freeze an expression in the middle of a method.
Anonymous
August 28, 2012
Isn't inmmutability a stronger concept than read only?... the optimization on the look up method seems to berak the inmmutability property of the data structure, meaning that it is possible to mutate it's inner state representation, even if the client can't see any real change through the data structure exposed API. Don't you think?

Delen via

Read-only and threadsafe are different

Comments

Aanvullende resources