Parallel Extensions for .NET - Part 1

Some of my recent posts have mentioned the Parallel Extensions to the .NET Framework CTP that was released in December by the Parallel Computing Team at Microsoft.  This post is meant to give an introduction to the Parallel Extensions, as well as some resources for the project.

The Parallel Extensions consist of two main components: the Task Parallel Library (TPL)and Parallel LINQ (PLINQ).  I will be writing about the TPL initially and will get into PLINQ in future posts.

The TPL is a set of classes in the System.Threading namespace that facilitate imperative data and imperative task parallelism.  In this post I will be introducing some of the imperative data parallelism concepts such as Parallel.For() and Parallel.ForEach().

Why is there a need for parallelism in development?

Quite simply, because the resources are available to do so.  The current trend in processor manufacturing is to put more processor cores on a single chip rather than to increase the speed of a single core.  As speed increases, so does power consumption and heat.  Moore's law is still in effect in regards to the number of transistors that can be placed on a chip, so manufacturers are now putting more lower speed cores on a singe chip to minimize the power and heat side effects.  This gives us the ability to use those extra processors to do more computations in parallel.

The problem is that current compilers do not automatically take advantage of the extra cores, so most of our applications only utilize a small amount of the resources that are available on even the lowest model computers on the current market.

Does .NET currently support parallelism?

Yes, but is can be complicated.  All of the tools for parallelism are available in .NET, but it is left up to the developer to utilize and implement them.  We can create new threads to execute units of work in parallel, but we still have to manage the synchronization and context for these parallel operations.

Here is an example of the code that you would have to write today to implement a multi-threaded For loop.  This splits the workload based on the number of processors available and then executes chunks of the loop in parallel using the ThreadPool. 

 public static void ThreadedFor(int start, int end, Action<int> action)
{
  int N = end - start;                      // Total size of the for loop
  int P = 2 * Environment.ProcessorCount;   // Typical twice the procs for distribution
  int Chunk = N / P;                        // Size of a work chunk
  int counter = P;                          // Counter to reduce work chunks
  AutoResetEvent signal = new AutoResetEvent(false);

  for (int i = 1; i <= P; i++)
  {
    ThreadPool.QueueUserWorkItem(delegate(object o)
    {
      int unit = (int)o;                     // Get the current "processor"
      for (int j = (unit - 1) * Chunk;       // Iterate through this work chunk
        j < (unit == P ? N : unit * Chunk);  // Dependent on current "process"
        j++)                                 // Increment
      {
        action(j);    // Do the work
      }

      if (Interlocked.Decrement(ref counter) == 0)    // Safe decrement
        signal.Set();                                 // Signal completion
    }, i);
  }

  signal.WaitOne();    // Waiting for the threaded operation to complete
}

Then you could use this implementation as shown below.

 // equivalent of: for (int i = 0; i < 10; i++)
ThreadedFor(0, 10, delegate(int i)
{
  Console.WriteLine(i);
});

As you can see, this can be somewhat complex to enable, especially for something as simple as a for loop. 

What does the TPL give us?

The TPL gives us the ability to use simple constructs to achieve imperative parallelism in our applications.  The developers of the TPL have implemented an underlying threading manager that handles the synchronization of multiple threads on multiple processors (among many other features such as thread stealing, which I will talk about a little later).  This is all transparent to the caller so that all we have to worry about is what we input and what we get back as output.  If the machine executing the code has multiple processors, the TPL executes the code using the available resources.  If the machine only has one core, the code executes in a serialized manner.  We as developers do not have to worry about targeting our code for one situation or the other, the TPL handles both scenarios. 

Here is an example of the same for loop above using the TPL:

 using System.Threading;

// Equivalent to: for (int i = 0; i < 10; i++)
Parallel.For(0, 10, delegate(int i)
{
  Console.WriteLine(i);
});

Or using the new expressions in .NET 3.5:

 using System.Threading;

// Equivalent to: for (int i = 0; i < 10; i++)
Parallel.For(0, 10, i =>
{
  Console.WriteLine(i);
});

How nice is this?  The TPL gives us the ability to call a simple method to implement a parallel for loop.

Of course, there are limitations.  Since we are executing the iterations in parallel, we cannot be guaranteed of the execution order.  So if the work taking place is dependent on the order of the loop variable, this will not work as expected.

As I eluded to earlier, one of the really cool features of the TPL is thread stealing.  In a nutshell, the task manager will realize when a certain thread on a processor is taking longer than expected to execute and it will transfer that work to a different processor and it will also transfer queued threads to other processors as they become available for work.  The guys from the Parallel Computing Team go into some good detail on this in the recent Channel 9 interview.

Another method available in the TPL is Parallel.ForEach().  Here is an example:

 using System.Threading;

// A simple string collection
string[] numbers = { "One", "Two", "Three", "Four", "Five", "Six", "Seven",
  "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen", "Fifteen"};

// equivalent to: foreach (string n in numbers)
Parallel.ForEach<string>(numbers, delegate(string n)
{
  Console.WriteLine("n={0}", n.ToString());
});

As you can see, the TPL provides us with a pretty simple way to achieve parallelism in our applications.  I encourage you to download the CTP and experiment for yourselves.

This was just a simple introduction to some of the basic features offered in the Parallel Extensions CTP.  In the next post, I will talk about the imperative task parallelism feature of the TPL such as Tasks, Futures, and Parallel.Do().  Until then, here are some good places on the web to learn more about Parallel Computing and the Parallel Extensions: