General Best Practices in the Concurrency Runtime
This document describes best practices that apply to multiple areas of the Concurrency Runtime.
Sections
This document contains the following sections:
Use Cooperative Synchronization Constructs When Possible
The Concurrency Runtime provides many concurrency-safe constructs that do not require an external synchronization object. For example, the concurrency::concurrent_vector class provides concurrency-safe append and element access operations. Here, concurrency-safe means pointers or iterators are always valid. It's not a guarantee of element initialization, or of a particular traversal order. However, for cases where you require exclusive access to a resource, the runtime provides the concurrency::critical_section, concurrency::reader_writer_lock, and concurrency::event classes. These types behave cooperatively; therefore, the task scheduler can reallocate processing resources to another context as the first task waits for data. When possible, use these synchronization types instead of other synchronization mechanisms, such as those provided by the Windows API, which do not behave cooperatively. For more information about these synchronization types and a code example, see Synchronization Data Structures and Comparing Synchronization Data Structures to the Windows API.
[Top]
Avoid Lengthy Tasks That Do Not Yield
Because the task scheduler behaves cooperatively, it does not provide fairness among tasks. Therefore, a task can prevent other tasks from starting. Although this is acceptable in some cases, in other cases this can cause deadlock or starvation.
The following example performs more tasks than the number of allocated processing resources. The first task does not yield to the task scheduler and therefore the second task does not start until the first task finishes.
// cooperative-tasks.cpp
// compile with: /EHsc
#include <ppl.h>
#include <iostream>
#include <sstream>
using namespace concurrency;
using namespace std;
// Data that the application passes to lightweight tasks.
struct task_data_t
{
int id; // a unique task identifier.
event e; // signals that the task has finished.
};
// A lightweight task that performs a lengthy operation.
void task(void* data)
{
task_data_t* task_data = reinterpret_cast<task_data_t*>(data);
// Create a large loop that occasionally prints a value to the console.
int i;
for (i = 0; i < 1000000000; ++i)
{
if (i > 0 && (i % 250000000) == 0)
{
wstringstream ss;
ss << task_data->id << L": " << i << endl;
wcout << ss.str();
}
}
wstringstream ss;
ss << task_data->id << L": " << i << endl;
wcout << ss.str();
// Signal to the caller that the thread is finished.
task_data->e.set();
}
int wmain()
{
// For illustration, limit the number of concurrent
// tasks to one.
Scheduler::SetDefaultSchedulerPolicy(SchedulerPolicy(2,
MinConcurrency, 1, MaxConcurrency, 1));
// Schedule two tasks.
task_data_t t1;
t1.id = 0;
CurrentScheduler::ScheduleTask(task, &t1);
task_data_t t2;
t2.id = 1;
CurrentScheduler::ScheduleTask(task, &t2);
// Wait for the tasks to finish.
t1.e.wait();
t2.e.wait();
}
This example produces the following output:
1: 250000000 1: 500000000 1: 750000000 1: 1000000000 2: 250000000 2: 500000000 2: 750000000 2: 1000000000
There are several ways to enable cooperation between the two tasks. One way is to occasionally yield to the task scheduler in a long-running task. The following example modifies the task
function to call the concurrency::Context::Yield method to yield execution to the task scheduler so that another task can run.
// A lightweight task that performs a lengthy operation.
void task(void* data)
{
task_data_t* task_data = reinterpret_cast<task_data_t*>(data);
// Create a large loop that occasionally prints a value to the console.
int i;
for (i = 0; i < 1000000000; ++i)
{
if (i > 0 && (i % 250000000) == 0)
{
wstringstream ss;
ss << task_data->id << L": " << i << endl;
wcout << ss.str();
// Yield control back to the task scheduler.
Context::Yield();
}
}
wstringstream ss;
ss << task_data->id << L": " << i << endl;
wcout << ss.str();
// Signal to the caller that the thread is finished.
task_data->e.set();
}
This example produces the following output:
1: 250000000
2: 250000000
1: 500000000
2: 500000000
1: 750000000
2: 750000000
1: 1000000000
2: 1000000000
The Context::Yield
method yields only another active thread on the scheduler to which the current thread belongs, a lightweight task, or another operating system thread. This method does not yield to work that is scheduled to run in a concurrency::task_group or concurrency::structured_task_group object but has not yet started.
There are other ways to enable cooperation among long-running tasks. You can break a large task into smaller subtasks. You can also enable oversubscription during a lengthy task. Oversubscription lets you create more threads than the available number of hardware threads. Oversubscription is especially useful when a lengthy task contains a high amount of latency, for example, reading data from disk or from a network connection. For more information about lightweight tasks and oversubscription, see Task Scheduler.
[Top]
Use Oversubscription to Offset Operations That Block or Have High Latency
The Concurrency Runtime provides synchronization primitives, such as concurrency::critical_section, that enable tasks to cooperatively block and yield to each other. When one task cooperatively blocks or yields, the task scheduler can reallocate processing resources to another context as the first task waits for data.
There are cases in which you cannot use the cooperative blocking mechanism that is provided by the Concurrency Runtime. For example, an external library that you use might use a different synchronization mechanism. Another example is when you perform an operation that could have a high amount of latency, for example, when you use the Windows API ReadFile
function to read data from a network connection. In these cases, oversubscription can enable other tasks to run when another task is idle. Oversubscription lets you create more threads than the available number of hardware threads.
Consider the following function, download
, which downloads the file at the given URL. This example uses the concurrency::Context::Oversubscribe method to temporarily increase the number of active threads.
// Downloads the file at the given URL.
string download(const string& url)
{
// Enable oversubscription.
Context::Oversubscribe(true);
// Download the file.
string content = GetHttpFile(_session, url.c_str());
// Disable oversubscription.
Context::Oversubscribe(false);
return content;
}
Because the GetHttpFile
function performs a potentially latent operation, oversubscription can enable other tasks to run as the current task waits for data. For the complete version of this example, see How to: Use Oversubscription to Offset Latency.
[Top]
Use Concurrent Memory Management Functions When Possible
Use the memory management functions, concurrency::Alloc and concurrency::Free, when you have fine-grained tasks that frequently allocate small objects that have a relatively short lifetime. The Concurrency Runtime holds a separate memory cache for each running thread. The Alloc
and Free
functions allocate and free memory from these caches without the use of locks or memory barriers.
For more information about these memory management functions, see Task Scheduler. For an example that uses these functions, see How to: Use Alloc and Free to Improve Memory Performance.
[Top]
Use RAII to Manage the Lifetime of Concurrency Objects
The Concurrency Runtime uses exception handling to implement features such as cancellation. Therefore, write exception-safe code when you call into the runtime or call another library that calls into the runtime.
The Resource Acquisition Is Initialization (RAII) pattern is one way to safely manage the lifetime of a concurrency object under a given scope. Under the RAII pattern, a data structure is allocated on the stack. That data structure initializes or acquires a resource when it is created and destroys or releases that resource when the data structure is destroyed. The RAII pattern guarantees that the destructor is called before the enclosing scope exits. This pattern is useful when a function contains multiple return
statements. This pattern also helps you write exception-safe code. When a throw
statement causes the stack to unwind, the destructor for the RAII object is called; therefore, the resource is always correctly deleted or released.
The runtime defines several classes that use the RAII pattern, for example, concurrency::critical_section::scoped_lock and concurrency::reader_writer_lock::scoped_lock. These helper classes are known as scoped locks. These classes provide several benefits when you work with concurrency::critical_section or concurrency::reader_writer_lock objects. The constructor of these classes acquires access to the provided critical_section
or reader_writer_lock
object; the destructor releases access to that object. Because a scoped lock releases access to its mutual exclusion object automatically when it is destroyed, you do not manually unlock the underlying object.
Consider the following class, account
, which is defined by an external library and therefore cannot be modified.
// account.h
#pragma once
#include <exception>
#include <sstream>
// Represents a bank account.
class account
{
public:
explicit account(int initial_balance = 0)
: _balance(initial_balance)
{
}
// Retrieves the current balance.
int balance() const
{
return _balance;
}
// Deposits the specified amount into the account.
int deposit(int amount)
{
_balance += amount;
return _balance;
}
// Withdraws the specified amount from the account.
int withdraw(int amount)
{
if (_balance < 0)
{
std::stringstream ss;
ss << "negative balance: " << _balance << std::endl;
throw std::exception((ss.str().c_str()));
}
_balance -= amount;
return _balance;
}
private:
// The current balance.
int _balance;
};
The following example performs multiple transactions on an account
object in parallel. The example uses a critical_section
object to synchronize access to the account
object because the account
class is not concurrency-safe. Each parallel operation uses a critical_section::scoped_lock
object to guarantee that the critical_section
object is unlocked when the operation either succeeds or fails. When the account balance is negative, the withdraw
operation fails by throwing an exception.
// account-transactions.cpp
// compile with: /EHsc
#include "account.h"
#include <ppl.h>
#include <iostream>
#include <sstream>
using namespace concurrency;
using namespace std;
int wmain()
{
// Create an account that has an initial balance of 1924.
account acc(1924);
// Synchronizes access to the account object because the account class is
// not concurrency-safe.
critical_section cs;
// Perform multiple transactions on the account in parallel.
try
{
parallel_invoke(
[&acc, &cs] {
critical_section::scoped_lock lock(cs);
wcout << L"Balance before deposit: " << acc.balance() << endl;
acc.deposit(1000);
wcout << L"Balance after deposit: " << acc.balance() << endl;
},
[&acc, &cs] {
critical_section::scoped_lock lock(cs);
wcout << L"Balance before withdrawal: " << acc.balance() << endl;
acc.withdraw(50);
wcout << L"Balance after withdrawal: " << acc.balance() << endl;
},
[&acc, &cs] {
critical_section::scoped_lock lock(cs);
wcout << L"Balance before withdrawal: " << acc.balance() << endl;
acc.withdraw(3000);
wcout << L"Balance after withdrawal: " << acc.balance() << endl;
}
);
}
catch (const exception& e)
{
wcout << L"Error details:" << endl << L"\t" << e.what() << endl;
}
}
This example produces the following sample output:
Balance before deposit: 1924
Balance after deposit: 2924
Balance before withdrawal: 2924
Balance after withdrawal: -76
Balance before withdrawal: -76
Error details:
negative balance: -76
For additional examples that use the RAII pattern to manage the lifetime of concurrency objects, see Walkthrough: Removing Work from a User-Interface Thread, How to: Use the Context Class to Implement a Cooperative Semaphore, and How to: Use Oversubscription to Offset Latency.
[Top]
Do Not Create Concurrency Objects at Global Scope
When you create a concurrency object at global scope you can cause issues such as deadlock or memory access violations to occur in your application.
For example, when you create a Concurrency Runtime object, the runtime creates a default scheduler for you if one was not yet created. A runtime object that is created during global object construction will accordingly cause the runtime to create this default scheduler. However, this process takes an internal lock, which can interfere with the initialization of other objects that support the Concurrency Runtime infrastructure. This internal lock might be required by another infrastructure object that has not yet been initialized, and can thus cause deadlock to occur in your application.
The following example demonstrates the creation of a global concurrency::Scheduler object. This pattern applies not only to the Scheduler
class but all other types that are provided by the Concurrency Runtime. We recommend that you do not follow this pattern because it can cause unexpected behavior in your application.
// global-scheduler.cpp
// compile with: /EHsc
#include <concrt.h>
using namespace concurrency;
static_assert(false, "This example illustrates a non-recommended practice.");
// Create a Scheduler object at global scope.
// BUG: This practice is not recommended because it can cause deadlock.
Scheduler* globalScheduler = Scheduler::Create(SchedulerPolicy(2,
MinConcurrency, 2, MaxConcurrency, 4));
int wmain()
{
}
For examples of the correct way to create Scheduler
objects, see Task Scheduler.
[Top]
Do Not Use Concurrency Objects in Shared Data Segments
The Concurrency Runtime does not support the use of concurrency objects in a shared data section, for example, a data section that is created by the data_seg#pragma
directive. A concurrency object that is shared across process boundaries could put the runtime in an inconsistent or invalid state.
[Top]
See also
Concurrency Runtime Best Practices
Parallel Patterns Library (PPL)
Asynchronous Agents Library
Task Scheduler
Synchronization Data Structures
Comparing Synchronization Data Structures to the Windows API
How to: Use Alloc and Free to Improve Memory Performance
How to: Use Oversubscription to Offset Latency
How to: Use the Context Class to Implement a Cooperative Semaphore
Walkthrough: Removing Work from a User-Interface Thread
Best Practices in the Parallel Patterns Library
Best Practices in the Asynchronous Agents Library