Transactional memory in a real world
(by Sasha Dadiomov)
Isn’t it a common phenomenon that each thing has many faces? If you were following the transactional memory community for some time, you probably saw it as a pretty theoretical area. There is obviously a lot of science here – discussions about things like “AME calculus” don’t leave much doubt. But there is more to be addressed before transactional memory becomes useful in practice… Happy Users = Great Theory + Good Implementation + Something More. This “Something More” is what I will try to explore in this blog.
So, imagine you have a wonderful implementation of Transactional Memory: correct, safe, scalable and quick. Will it be picked up by the world’s developers? Well, let’s pretend to be one of them.
On day one, you wrote a few samples, played with performance, and became excited. On day two, you started to use it in a new feature of your current hairy project. I bet your new feature will not work perfectly after your last keystroke from yesterday’s frantic coding session… well, how about debugging it? Here, a naïve vendor of transactional memory may let you down. First, there may be multiple re-executions. Do you want your debugger to step through them all? And don’t forget that, with optimistic concurrency control, state before validation, which typically happens at the end of the transaction, may be inconsistent. It means you will see values that just don’t make sense. Yes, eventually TM will notice it, roll back the state and re-execute, but you don’t know it yet… you are sitting in front of a debugger, seeing x equal to 1 right after the previous line x=2. Isn’t it confusing? Even if you were warned about optimistic concurrency control, you may find it hard to reason about your program with this distorted mirror…
Well, this problem can be solved. The debugger may be taught to validate a transaction before stopping at a breakpoint or stepping, so you will not see inconsistent state. But it means that the TM package must include a modified debugger. By the way, there are more things to fix in the debugger. For instance, shadow copy-based TM implies that there will be multiple per-transaction states of the data, so the debugger must show the correct one. Consider debugging a program with two threads: in thread A with no transaction, the value of x is 1; but thread B’s transaction may be setting x to 2. When you view the value of x under the debugger, you want to see 2 when stopped in thread B, but probably 1 when stopped in thread A (showing 2 is problematic, since transaction B haven’t committed yet – what if in the next re-execution B’s transaction would set x to 3? Changes should be invisible outside transaction until commit). Even validating transactions in the debugger is not enough: it additionally has to navigate between global and in-transaction states to show correct values. An alternative solution would be to stop all threads except one while debugging, but this is like looking for a wallet under the lamppost – you don’t want to change the real scenario for the sake of debugging ease.
OK, the previous paragraph was intended to show that the TM vendor has to deliver a debugging solution with it. The same may apply to a profiler: while a sampling profiler will probably work as usual, the instrumenting one may get fooled by re-executions. And there are new all-important questions now - e.g. which variables cause most of the conflicts? These are your contention points – you probably will change your design to ease the bottleneck. Most of execution time may go into re-execution because of a single variable being modified by several threads concurrently, but how do you find it?
And what about tracing? Do you want to see traces from all re-executions, or only from the valid ones? Do you want to see traces with inconsistent values, or they will just distract you attention from the real problems when you read the trace? Do you want to see events with inconsistent data?
Well, the picture is clear… changing the fundamental behavior never goes unpunished: the whole ecosystem of tools needs to follow the change. Tooling is the first element of “Something More”.
Now let us look from a totally different perspective: how do atomic blocks co-exist with other facilities used in your program?
For starters, what about your old traditional transactions (e.g. MSDTC or System.Transactions in Windows)? After all, these were in wide usage before transactional memory inventors went to school. World developers know what transactions are; it would be utterly misleading if TM transactions would be different. It would be also pity if these two kinds of transactions were not usable together. Both strive for atomicity and isolation: everything inside a transaction happens all-or-nothing, and the world is isolated from intermediate states. The difference is domain: TM serves memory, while traditional transactions typically serve database, queues, files, and so on – but usually not memory! Why? Would it be beneficial for you to have a guarantee that all your operations inside of a transaction are atomic and isolated – database, networking, memory, etc.?
I personally think it could make programs much simpler by eliminating the need to process all possible failure combinations. Imagine a program that executes some algorithm in memory, but also needs to keep a reliable and consistent reflection of the memory elements in a database. Your code will change data in the memory and in the database; you want them to be always equal, e.g. you cannot tolerate non-logged memory changes. To achieve it, you will need to roll back memory changes in case of database failures. But “roll back” smells familiar… isn’t it what TM knows how to do? And aren’t system transactions designed to serve multiple resources? And why memory cannot be one of them? Well, it seems we will miss a lot of potential benefits if we don’t integrate these technologies. Moreover, TM without integration with system transactions runs a risk of confusing people, who will keep asking “which transaction did you mean”? TM vendors will probably have to avoid the word “transaction” altogether... keeping in sync with the programmer’s mindset is another part of the “Something More”. Please note though that it would be bad to sacrifice performance of the pure memory scenarios for the benefit of a wider ones; it makes implementation even harder… it should work optimally by itself and “better together” with system transactions.
OK, we started speaking about co-existence with other features. Let’s look now at numerous I/O calls scattered around your program – or more widely, any calls with side effects (e.g. killing other process). What if your newly introduced atomic block includes some of them? Well, naive TM could possibly re-execute them, e.g. repeat your printf multiple times. Even worse – since it is possible now for the program to run on inconsistent (read: random) data, awful things may happen… for instance “if(x)FormatDisk()” may be actually called with random value of x! This is well-known TM problem, and any TM has to take care of it. Usually “care” is a red tape: just forbid it. Very limiting… makes one doubt whether he/she wants to use TM… so next variant is “irrevocability”: whenever a transaction tries its first I/O, it switches from speculative mode to a global lock – no re-executions are possible anymore. The cost is serialization - all other transactions in the system will be required to take the global lock, or be stopped. This approach is viable, but probably should be the last line of defense since it hurts scalability so much. Integration with system transactions opens several better avenues, utilizing the resource manager’s ability to postpone or roll back resource manager actions (e.g. easily defining a custom resource manager around the necessary side-effecting actions). Helpful would be also the ability to just defer some actions for commit time, or register compensations for the rollback case. And any approach will require some checks – which APIs are callable under transaction, and which should be forbidden.
Well, we mentioned a lot of stuff which a shipping TM has to address. There is more, of course; what I wanted to show was just another face of transactional memory, the “outer” one – its integration with the development and debugging environment and with existing concepts and bodies of code; compatibility with environment is that mysterious “Something More” from the beginning of this blog.
Comments
Anonymous
January 02, 2009
PingBack from http://www.codedstyle.com/transactional-memory-in-a-real-world/Anonymous
January 08, 2009
I have two comments. The first one is about I/O calls from inside transactions. It is not my main focus of work but at first sight it seems that they shouldn't be allowed since we are performing transactions on memory (that's why it's called transactional memory, after all). Even if you consider that most I/O operations are buffered, the problem still persists. My second comment is actually a question. Have any of you read the experiences related by a group from IBM about STM ("STM: why is it only a research toy?" ACM Queue)? I talked to one of the authors (Siddhartha) in a recent conference and he looked quite skeptical about the future of both HTM and STM. Although this post discusses some related issues, how do you guys see the future of STM in more general terms? Or in other words, have you found an application domain where STM really seems promising in terms of programmability and/or performance? -Alex.Anonymous
January 09, 2009
Hello Alex, Let me answer the first of you questions (about I/O); we will answer the second one separately. Yes, in the academic settings transactional memory is usually associated with memory only. But it does not seem to be sufficient in the practical applications of TM: in many real scenarios, memory actions go along with logging, communication, database operations, and so on. Not always you can easily separate memory changes from these actions; we also think that in many cases you actually don't want to separate them at all. Failure atomicity of the combined memory / non-memory operation may be very desirable reliability feature - it makes error processing much more straightforward. For instance, if your program controls some real process or the game, it probably has some model description of the process in memory, and it probably wants to keep the description in sync with the reality. It would mean that sending control signal to the process and reflecting it in a memory should be parts of the same transaction. Thanks, SashaAnonymous
January 09, 2009
The comment has been removedAnonymous
January 11, 2009
The comment has been removedAnonymous
January 13, 2009
The comment has been removedAnonymous
January 16, 2009
This week’s theme is functional programming.  Included are discussions on Software TransactionalAnonymous
January 20, 2009
The comment has been removedAnonymous
January 26, 2009
The comment has been removedAnonymous
March 29, 2009
The comment has been removed