How to: Write a parallel_for_each
Loop
This example shows how to use the concurrency::parallel_for_each
algorithm to compute the count of prime numbers in a std::array
object in parallel.
Example
The following example computes the count of prime numbers in an array two times. The example first uses the std::for_each
algorithm to compute the count serially. The example then uses the parallel_for_each
algorithm to perform the same task in parallel. The example also prints to the console the time that is required to perform both computations.
// parallel-count-primes.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <iostream>
#include <algorithm>
#include <array>
using namespace concurrency;
using namespace std;
// Returns the number of milliseconds that it takes to call the passed in function.
template <class Function>
__int64 time_call(Function&& f)
{
__int64 begin = GetTickCount();
f();
return GetTickCount() - begin;
}
// Determines whether the input is a prime.
bool is_prime(int n)
{
if (n < 2)
{
return false;
}
for (int i = 2; i < int(std::sqrt(n)) + 1; ++i)
{
if (n % i == 0)
{
return false;
}
}
return true;
}
int wmain()
{
// Create an array object that contains 200000 integers.
array<int, 200000> a;
// Initialize the array such that a[i] == i.
int n = 0;
generate(begin(a), end(a), [&]
{
return n++;
});
// Use the for_each algorithm to count, serially, the number
// of prime numbers in the array.
LONG prime_count = 0L;
__int64 elapsed = time_call([&]
{
for_each(begin(a), end(a), [&](int n)
{
if (is_prime(n))
{
++prime_count;
}
});
});
wcout << L"serial version: " << endl
<< L"found " << prime_count << L" prime numbers" << endl
<< L"took " << elapsed << L" ms" << endl << endl;
// Use the parallel_for_each algorithm to count, in parallel, the number
// of prime numbers in the array.
prime_count = 0L;
elapsed = time_call([&]
{
parallel_for_each(begin(a), end(a), [&](int n)
{
if (is_prime(n))
{
InterlockedIncrement(&prime_count);
}
});
});
wcout << L"parallel version: " << endl
<< L"found " << prime_count << L" prime numbers" << endl
<< L"took " << elapsed << L" ms" << endl << endl;
}
The following sample output is for a computer that has four cores.
serial version:
found 17984 prime numbers
took 125 ms
parallel version:
found 17984 prime numbers
took 63 ms
Compiling the Code
To compile the code, copy it and then paste it in a Visual Studio project, or paste it in a file that is named parallel-count-primes.cpp
and then run the following command in a Visual Studio Command Prompt window.
cl.exe /EHsc parallel-count-primes.cpp
Robust Programming
The lambda expression that the example passes to the parallel_for_each
algorithm uses the InterlockedIncrement
function to enable parallel iterations of the loop to increment the counter simultaneously. If you use functions such as InterlockedIncrement
to synchronize access to shared resources, you can present performance bottlenecks in your code. You can use a lock-free synchronization mechanism, for example, the concurrency::combinable
class, to eliminate simultaneous access to shared resources. For an example that uses the combinable
class in this manner, see How to: Use combinable to improve performance.