Share via


Welcome to the Transactional Memory Team's Blog!

If you have been using the Parallel Extension CTP or simply writing multi-threaded code yourself, you probably have run into situations where you needed to share data between multiple threads. So long as the data is read-only, this isn’t a problem, but what about mutating data?

The easy answer is to use a lock. There are a lot of blog entries and white papers talking about how to use locks correctly, how to avoid deadlocks, or what are the best locks for the particular scenario, or even how to correctly write lock-free code. You could read all of these and still run into trouble using locks. You see, the problem isn’t sharing one piece of data; it’s when you are sharing multiple pieces of data – for instance data that has a complex schema involving multiple complex objects such as trees or lists.

So, locks are a basic tool in your arsenal. From the simple lock, you can build synchronization mechanisms that hopefully protect your data, correctly, and don’t impact your scalability.

Ah, I hear you sigh. Yes, hope is eternal, software has bugs and multithreaded software has race conditions, deadlocks, and scalability problems. Why? Well, because many find it is hard and fraught with peril to correctly use anything more than a single lock or at most some really small set of course-grained locks. As code matures, locking hierarchies to provide fine-grained-locking often morph from elegant to clumsy. You may also find that as your project grows, lock depth blossoms, unnecessarily, or alternatively race conditions are introduced simply because programmers were unaware that it necessary to lock a specific resource. The end result is code that simply doesn’t scale or your application’s reliability plummets without some of your best and brightest spending time tuning, fixing, “right-sizing” and eliminating locks. Even after all that work, are you confident that your code is bug-free? Do race conditions exist in it? How can you even inspect the code and reason about its correctness?

So, I suspect you are living with some rather course-grained locks or with fear.

Many of my coworkers argue quite persuasively that you should get rid of locks by decomposing the data such that there is no sharing. They point to the use of “agents” and “messaging” as a good programming paradigm to ensure that data accessed by a specific thread is logically isolated from other threads. This is a great method but can you re-architect your code to do this? Can you use it over multiple data structures? Can you compose them together?

The jury is still out, but I suspect that not all programming patterns in traditional object-oriented languages can be decomposed to a point where there is no sharing. I find that advocates of the agent model still have concurrency problems! They have shared data within their agents and they are hampered by either requiring one thread per agent or to use locks or interlocked operations to manage this shared state.

What would you say if I told you that you could reason about your code in a sequential fashion and be relieved of worrying about other code interacting with its data? Oh, let’s say it’s in a way that scales? Would that help you?

Well, that is what transactional memory (TM) promises to provide.

Transactional memory is not about “removing locks” but is about abstracting away the requirement to specify a particular lock. Instead, you can structure your code in well defined sequential blocks of code, what in the database world we call “units of work”, and then let the underlying runtime system, compiler, or hardware provide you the guarantees you desire. Further, you want this work to scale. To do that, the underlying system provides concurrency control optimistically. Instead of always locking a resource, the transactional memory system assumes that there is no contention. Instead, it detects when these assumptions are incorrect and rolls back changes that were made in the block. Depending on the implementation, the transactional memory system may then re-execute your block of code.

In this way, transactional memory provides an execution pattern of multithreaded code that can be reasoned about sequentially and when there is very little contention, scales well. What about when there is contention? Well, “it depends” will likely be the answer. How big the transaction is and how often it re-executes (due to contention) changes the performance curves significantly. Transactional Memory systems usually include some code to manage contention. That contention manager will significantly impact how code scales under contention. We would hope that the resulting performance is not worse than fully serialized execution.

So what does this look like? Various experimental TM systems have been produced. Some integrate with the compiler, others, like SXM, provide library support. When I talk about TM and show examples I prefer to use a simple syntax which delineates a block of code as “atomic”. For instance:

                Atomic {

                                myBigCollection.insert(someStuff);

                                if (condition)

                                                otherCollection.mutate (myBigCollection);

}

 

So, in this example I have two collections and some operation that may optionally work on one depending on the other. Obviously, my code is demonstrating composability over these two collections. What might not be so obvious is that I didn’t have to catch any errors here – my code completed successfully or an exception was thrown and nothing was done. How would I do that with locks now?

 

                Lock (myCollectionLock) {

                                Bool a = false;

                                Bool b = false;

                                try {

                                myBigCollection.insert(someStuff);

                                a = true;

 

                                if (condition) {

                                                otherCollection.mutate (myBigCollection);

                                                b = true;

                                }

                Catch (exception e) {

                                If (a) myBigCollection.remove(someStuff);

                                If (b) myStaticStuff.unmutate(otherCollection, myBigCollection);

// I probably had to write my own unmutate

Throw e;

                }

} // lock

 

Which one do you want to maintain or test? Oh, did you catch that I had to write my own ‘unmutate’? Ugh, I hope that is possible. Also, did you note that I had to keep the lock through the catch? If I didn’t, there would be a race condition when a failure occurred and the work undone. Heck, just setting the flags is error prone. Consider what would have happened if I set “b = true” but forgot to correctly enclose the “if (condition)” clause.

Could your test team find this bug?

                                if (condition)

                                                otherCollection.mutate (myBigCollection);

                                                b = true;

 

This blog is about transactional memory with entries written by the team at Microsoft that is implementing an experimental transactional memory system in .NET. While we work in the Developer Division’s Parallel Computing Platform product group, we are not working on a product release at this time. Instead we have been incubating, researching, and experimenting with transactional memory.

Our goal is to create a TM system that lives up to its promise. This work has lead us to what we think is some industry-leading work – but we have been so busy doing it that we haven’t had a chance to talk about it; share what we have found; and more importantly hear what the community thinks of this effort.

This blog is our first step to remedy that. Welcome to the Transactional Memory blog here at MSDN.

Comments

  • Anonymous
    October 07, 2008
    PingBack from http://www.easycoded.com/welcome-to-the-transactional-memory-teams-blog/

  • Anonymous
    October 12, 2008
    The Parallel Computing Platform team at Microsoft is working on much more than Parallel Extensions to

  • Anonymous
    October 13, 2008
    Wow. If you guys could pull off something like that, it would help us greatly with our server scalability. Looking forward to seeing what you guys come up with.

  • Anonymous
    October 14, 2008
    How are you going to handle the composability problem of STM? If you have several transactional memory locations, modifications to each of which should logically be nested transactions of an outer transaction, there's not going to be any help from a compiler or language to tell you about the inconsistency. Won't you just have the same problem as locks, wherein programmers still need to know all the details of where to put their atomic blocks as opposed to their locking regions, and will typically punt to one-giant-transaction with similar serialization effects as one-giant-lock? How are you going to handle the I/O and general global mutable memory problem? Only transactional memory can be rolled back and retried. Once the bits have escaped I/O, there's no getting them back. How are you going to deal with predictability of performance? A transaction could be stuck in a retry / abort / rollback loop for quite a while. Without seriously good visibility into this, we're not going to be much better off. If people start having problems with their transactions being too large, how are you going to help them in indicating where they should be split up? How are you going to deal with the basic namespace / affinity problem? The problem is shared mutable state, blobs of data floating in the heap, usually. But atomic regions only help you with whatever transactional memory locations get accessed during the transaction; they don't help the programmer reason clearly about the shared state. I'm interested in seeing what you present, but I do hope you don't oversell developers on STM.

  • Anonymous
    October 14, 2008
    The comment has been removed

  • Anonymous
    October 15, 2008
    You a bit cheating here. The code must look not like: Lock (myCollectionLock) {    Bool a = false;    Bool b = false;    try {        myBigCollection.insert(someStuff);        a = true;        if (condition) {            otherCollection.mutate (myBigCollection);            b = true;    }    Catch (exception e) {        If (a) myBigCollection.remove(someStuff);        If (b) myStaticStuff.unmutate(otherCollection, myBigCollection);        // I probably had to write my own unmutate        Throw e;    } } // lock but: Lock (myCollectionLock) {    myBigCollection.insert(someStuff);    if (condition) {        try {            otherCollection.mutate (myBigCollection);        }        Catch (exception e) {            myBigCollection.remove(someStuff);            Throw e;        }    } } // lock Not need for 'unmutate', not need for flags. And in C++CLI it will look just like: Lock l (myCollectionLock); myBigCollection.insert(someStuff); remover r (myBigCollection, someStuff); if (condition)    otherCollection.mutate (myBigCollection); r.dismiss();

  • Anonymous
    October 16, 2008
    If the "mutate" operation is atomic; yes you are correct.  I wasn't assuming that it was atomic.  I was assuming that the mutation may have partially completed before the exception was thrown. Oh, and how do I guarantee that the mutation operation is atomic?  What happens if I didn't write that code and its documentation doesn't specify what happens in case of an error? Now add a third operation.  Or a loop that works over a collection or...   So, while this trivial example could be solved different ways, you can see that it is simply demonstrating the underlying complexity we see in many of today's programs.  Which is that going from one "consistent" state to another "consistent" state may require more than one change.  If failures happen, you have to undo each of those changes.  You have to write code to do this (and it may not always be trivial), you have to test that code, and you have to maintain it.

  • Anonymous
    October 17, 2008
    Re: Oh, and how do I guarantee that the mutation operation is atomic?... It has nothing to do with multi-threading. If the operation is not atomic that it's busted, then I am not able to just call it even in single-threaded program. Btw, it raises interesting question and possible usage for TM. Can I use TM in single-threaded program just to "rollback" changes in case of exception? I.e. let's assume I have code "array.sort();". Let's assume sort() provides only basic exception safety, i.e. if exception is thrown by copy ctor of element, then array will stay in some intermediate state. Well, it seems that I can just wrap some complicated operation in atomic block to achieve strong exception safety: "atomic { array.sort(); }" - ok, now array will be either in initial state, or in final state, no intermediate states no more. Hmm... however it can be very expensive mean indeed, if we will consider size of readset and size of writeset. On the other hand, it's extremely simple. I am also curious about implementation details of your STM. Do you have plans to publish some papers or just to describe details in blog? I mean things like eager vs lazy writes, or implementation of privatization safety, or usage of Bloom-filters/timestamps for validation etc.

  • Anonymous
    October 17, 2008
    The comment has been removed

  • Anonymous
    October 18, 2008
    The comment has been removed

  • Anonymous
    October 18, 2008
    The comment has been removed

  • Anonymous
    June 03, 2009
    Earlier today I saw this post about implementing locks with a timeout which claimed to have a thread