Why Task Based Programming (Parallel Extensions)
Hi,
Today is October 15th 2009 and I am writing this post to express my understanding about parallel programming using Parallel Extensions (or Task based parallel programming); specifically to express why should we go for this approach.
Since the Parallel Extension announcement, I was excited to get my hands dirty with it but after working with Parallel Extensions, I had few initial questions that I thought everyone would have and there is less content available online that targets against benefits of Parallel extensions. Below are a couple of questions that I am targeting in this post.
- Is Threading going to die? Are Parallel Extensions replacements to Threading?
- Sometimes Threading gives me better performance than Task based programming. Why?
Before we come to these questions, it would be worth understanding options other than parallel extensions that we used to follow for parallel programming. Before Parallel Extensions, we would use primarily any of the below two modes for parallel programming:
Directly Creating Threads:
Here we create Threads directly in our code something similar to the following code:
Thread t = new Thread(DoSomeWorkMethod);
t.Start(someInputValue);
Benefits:
The benefit of this approach is that each of the thread will run at all times. .NET Framework’s scheduling algorithm will make sure that all the threads that we created will execute in similar to RoundRobin algorithm (though actual algorithm is different). This approach is especially useful in cases where I want to provide justice to all threads.
Another benefit associated with this approach is the fact that we have control over these created threads like aborting a thread in the middle.
Drawbacks:
The drawback of this approach comes into picture when we have more threads than the available CPUs (cores). As an example, if we have two cores and I am running 10 threads, context switching, invalidation of each thread’s cache etc. So we need to keep a balance between how many CPUs we have and how many Threads should we create for optimum performance.
Using Thread Pool:
.NET Framework has a thread pool (configurable) which limits the number of Threads that can be run at a time. Now when we want to execute something in a separate thread, we simply queue our work on a thread of thread pool instead of creating our own separate thread. Code looks something similar to below:
ThreadPool.QueueUserWorkItem(
DoSomeWorkMethod, someInputValue);
Benefits:
Thread Pool size for .NET Framework can be set by doing a simple calculation using number of CPUs the machine has. After this, Thread Pool makes sure to run only those number of threads that are optimal for a particular machine. This prevents any context switching and other overheads. So we get best performance.
Drawbacks:
The drawback of this approach is that developer doesn’t have much control over the Queued tasks over Thread Pool. In other words, once we have queued a work item, we do not get a reference to it and there is no explicit support for knowing when it completed, or for blocking until it completes or for cancelling it, or for identifying it in the debugger via some sort of ID etc. Plus, you are not guaranteed when your Task will start to run, so progress is unknown.
Using Parallel Extensions:
This approach is the one that we are here to discuss. Task based programming combines the benefit of both the approaches with some new additions. Using task Based programming, you create Tasks instead of Threads and start executing them. Programmatically they are almost identical to Thread Based programming with additions features. Here is a code snippet that uses the new Task class:
Task t = Task.Create(DoSomeWorkMethod,
someInputValue);
Benefits:
- With Task Based programming, by default, you can be sure that we are not actually decreasing the performance by doing excessive multitasking since Task Manager (central class in Parallel Extensions) takes care of this issue by queuing any additional tasks if all the existing CPUs are busy.
- With Task based programming you have complete control over the Task. You can pause, abort, wait, join etc.
- The scheduling algorithm is improved over Thread Pool approach with new features like “Task Stealing”.
- You get additional constructs to easily work with, like Parallel LINQ, Parallel For Loop etc.
Drawbacks:
Personally I see the only drawback is that you are not guaranteed when the Task will execute. So if you want to guarantee of starting your Task execution as soon as it is created, use plain old Threading Way. UI multitasking as an example should be done with Threading and not Thread Pool or Task based programming.
I will also share one interesting incident that happened while I was developing a Proof Of Concept. I was running the samples on a 4 Core Machines and created 2 Threads to do a Task and then 2 Tasks to do the same task and I could not see any improvement. Then I realized that if I am creating Threads which are less in number than available free CPUs I might actually get some greater performance sometimes, but that is not a guarantee. With task based programming, I can guarantee that either all my Tasks are running in parallel or all CPUs are busy executing my tasks.
I will try to cover benefits approach in future posts but for now, it is done.
I hope this post was useful. If you are a Microsoft Partner, please feel free to involve our team for all your development consultation needs by contacting us here. Comments, corrections; I Love them.
Rahul Gangwar