Threading deep dive – Day 7

Article
07/08/2008

Reads / Writes to the memory

Before we jump in to atomic operations, I would like to give some information on order of reads and writes into the main memory. The chip makers are under pressure to make the processor faster and faster. To improve processor performance, the chip makers introduce multiple caches between the processor and the RAM.

Refer the diagram for clear understanding] The reads and writes are not performed the same order as our program does even in case single processor machines. Like single processor counterparts multiple processor machines perform read and writes in a more performing way that order of the reads and writes are not considered. When a value is read from the memory, other values that exist adjacent to the value being read also brought to the caches of the processor. So when the processor wants a value from the RAM, it looks into its local cache first. If the required value is not available in the local cache, then only the processor would try to bring the value from the main memory. If the value is available in the local cache, the processor would not go to RAM to fetch the value. In a multi processor scenario, there could be chances that the value available in the processor’s local cache may be stale.

To avoid the issue of reading stale data, we put locks. What does lock do? It instructs the processor to follow some order while reading and writing the data to/from the RAM. The .NET 2.0 rune time instructs the processor(s) to follow the below protocol when the CLR executes the statement that is inside the lock block.

1. Read should never go before the time the lock is acquired.

2. Write should never go beyond the lock is released.

This also means, the reads and writes inside the lock block can be of any order. This freedom is given for processors to improve their performance. When we force the processor to perform reads and writes in certain order, it affects the speed of the processor. In case of locks we trade off processor performance with synchronization.

What are Atomic operations? Operations that would get executed fully before the thread that executes the operation pre-empted are called atomic operation. In other words, The thread that executes the atomic operations would not get preempted before completing the atomic operation. Interlocked class from .NET framework provides atomic functionality. Atomic operations are faster then lock. Atomic operations guarantee that the atomic reads are always from the main memory and atomic writes are immediately updated to the main memory immediately. One can write effective locking with the help of Interlocked. Exchange method. Though locks are best fit for many places and solves most of the synchronization related issues, performance of the locks are not so great when we compare them with interlocked.

Locks vs atomic operations

Locks controls instruction reordering where as interlocked operations guarantees reads /writes from/to the processors is immediate and would be available to other processors / threads that does appropriate reads. In fact acquiring lock is an atomic operation. If acquiring lock is not an atomic operation, then more than one thread could acquire same lock. I have seen code where people use locks to just to increment one value. For these cases interlocked class is of good fit.

Refer below the code snippets:

// Uses Interlocked classes

static void Main(string[] args)

{

int j = -10;

Stopwatch sw = new Stopwatch();

sw.Start();

for (int i = 0; i < Int32.MaxValue; i++)

{

j++;

}

sw.Stop();

Console.WriteLine("Total time taken {0} ms", sw.ElapsedMilliseconds);

Console.ReadLine();

}

// Uses Locks

static void Main(string[] args)

{

object lockObj = new object();

int j = -10;

Stopwatch sw = new Stopwatch();

sw.Start();

for (int i = 0; i < Int32.MaxValue; i++)

{

lock (lockObj)

{

j++;

}

sw.Stop();

Console.WriteLine("Total time taken {0} ms", sw.ElapsedMilliseconds);

Console.ReadLine();

}

I have written these code snippets to compare the speed of lock and interlocked statement execution. When I executed above snippets from my laptop [Toshiba Tecra-M5, 2 GHZ, Dual core, 4GB RAM- Win2K3], below are results that I got.

Interlocked - Total time taken 64326 ms:

Lock - Total time taken 141203 ms:

It is clear that locks take more time. Whenever possible, try to use methods from interlocked classes.

Atomic operations on Multi-core / Multi processors machines.

Though atomic operations guarantee to reflect the latest modification to the primary memory and in the processor that executes, it the responsibility of the code to use atomic operations wherever required. Assume the following

1. You have got two processor machines.

2. Processor 1 Execute one atomic operation on a variable var1.

3. At the same time Processor2 access var2 using normal read.

In this case, Processor2 may get stale value as it has no idea that var2 value is changed in the memory. To make the processor 2 to read new up to-date value, code that reads var1 from processor 2, should do volatile read / make use of lock as the processor 2 may have stale data in its cache which might not have been invalidated.

Share via

Threading deep dive – Day 7

Additional resources