OpenMP Directives
Provides links to directives used in the OpenMP API.
Visual C++ supports the following OpenMP directives.
For parallel work-sharing:
Directive | Description |
---|---|
parallel | Defines a parallel region, which is code that will be executed by multiple threads in parallel. |
for | Causes the work done in a for loop inside a parallel region to be divided among threads. |
sections | Identifies code sections to be divided among all threads. |
single | Lets you specify that a section of code should be executed on a single thread, not necessarily the main thread. |
For main thread and synchronization:
Directive | Description |
---|---|
master | Specifies that only the main thread should execute a section of the program. |
critical | Specifies that code is only executed on one thread at a time. |
barrier | Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the barrier. |
atomic | Specifies that a memory location that will be updated atomically. |
flush | Specifies that all threads have the same view of memory for all shared objects. |
ordered | Specifies that code under a parallelized for loop should be executed like a sequential loop. |
For data environment:
Directive | Description |
---|---|
threadprivate | Specifies that a variable is private to a thread. |
atomic
Specifies that a memory location that will be updated atomically.
#pragma omp atomic
expression
Parameters
expression
The statement that has the lvalue, whose memory location you want to protect against more than one write.
Remarks
The atomic
directive supports no clauses.
For more information, see 2.6.4 atomic construct.
Example
// omp_atomic.cpp
// compile with: /openmp
#include <stdio.h>
#include <omp.h>
#define MAX 10
int main() {
int count = 0;
#pragma omp parallel num_threads(MAX)
{
#pragma omp atomic
count++;
}
printf_s("Number of threads: %d\n", count);
}
Number of threads: 10
barrier
Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the barrier.
#pragma omp barrier
Remarks
The barrier
directive supports no clauses.
For more information, see 2.6.3 barrier directive.
Example
For a sample of how to use barrier
, see master.
critical
Specifies that code is only be executed on one thread at a time.
#pragma omp critical [(name)]
{
code_block
}
Parameters
name
(Optional) A name to identify the critical code. The name must be enclosed in parentheses.
Remarks
The critical
directive supports no clauses.
For more information, see 2.6.2 critical construct.
Example
// omp_critical.cpp
// compile with: /openmp
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define SIZE 10
int main()
{
int i;
int max;
int a[SIZE];
for (i = 0; i < SIZE; i++)
{
a[i] = rand();
printf_s("%d\n", a[i]);
}
max = a[0];
#pragma omp parallel for num_threads(4)
for (i = 1; i < SIZE; i++)
{
if (a[i] > max)
{
#pragma omp critical
{
// compare a[i] and max again because max
// could have been changed by another thread after
// the comparison outside the critical section
if (a[i] > max)
max = a[i];
}
}
}
printf_s("max = %d\n", max);
}
41
18467
6334
26500
19169
15724
11478
29358
26962
24464
max = 29358
flush
Specifies that all threads have the same view of memory for all shared objects.
#pragma omp flush [(var)]
Parameters
var
(Optional) A comma-separated list of variables that represent objects you want to synchronize. If var isn't specified, all memory is flushed.
Remarks
The flush
directive supports no clauses.
For more information, see 2.6.5 flush directive.
Example
// omp_flush.cpp
// compile with: /openmp
#include <stdio.h>
#include <omp.h>
void read(int *data) {
printf_s("read data\n");
*data = 1;
}
void process(int *data) {
printf_s("process data\n");
(*data)++;
}
int main() {
int data;
int flag;
flag = 0;
#pragma omp parallel sections num_threads(2)
{
#pragma omp section
{
printf_s("Thread %d: ", omp_get_thread_num( ));
read(&data);
#pragma omp flush(data)
flag = 1;
#pragma omp flush(flag)
// Do more work.
}
#pragma omp section
{
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush(data)
printf_s("Thread %d: ", omp_get_thread_num( ));
process(&data);
printf_s("data = %d\n", data);
}
}
}
Thread 0: read data
Thread 1: process data
data = 2
for
Causes the work done in a for
loop inside a parallel region to be divided among threads.
#pragma omp [parallel] for [clauses]
for_statement
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
for_statement
A for
loop. Undefined behavior will result if user code in the for
loop changes the index variable.
Remarks
The for
directive supports the following clauses:
If parallel
is also specified, clauses
can be any clause accepted by the parallel
or for
directives, except nowait
.
For more information, see 2.4.1 for construct.
Example
// omp_for.cpp
// compile with: /openmp
#include <stdio.h>
#include <math.h>
#include <omp.h>
#define NUM_THREADS 4
#define NUM_START 1
#define NUM_END 10
int main() {
int i, nRet = 0, nSum = 0, nStart = NUM_START, nEnd = NUM_END;
int nThreads = 0, nTmp = nStart + nEnd;
unsigned uTmp = (unsigned((abs(nStart - nEnd) + 1)) *
unsigned(abs(nTmp))) / 2;
int nSumCalc = uTmp;
if (nTmp < 0)
nSumCalc = -nSumCalc;
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel default(none) private(i) shared(nSum, nThreads, nStart, nEnd)
{
#pragma omp master
nThreads = omp_get_num_threads();
#pragma omp for
for (i=nStart; i<=nEnd; ++i) {
#pragma omp atomic
nSum += i;
}
}
if (nThreads == NUM_THREADS) {
printf_s("%d OpenMP threads were used.\n", NUM_THREADS);
nRet = 0;
}
else {
printf_s("Expected %d OpenMP threads, but %d were used.\n",
NUM_THREADS, nThreads);
nRet = 1;
}
if (nSum != nSumCalc) {
printf_s("The sum of %d through %d should be %d, "
"but %d was reported!\n",
NUM_START, NUM_END, nSumCalc, nSum);
nRet = 1;
}
else
printf_s("The sum of %d through %d is %d\n",
NUM_START, NUM_END, nSum);
}
4 OpenMP threads were used.
The sum of 1 through 10 is 55
master
Specifies that only the main thread should execute a section of the program.
#pragma omp master
{
code_block
}
Remarks
The master
directive supports no clauses.
For more information, see 2.6.1 master construct.
To specify that a section of code should be executed on a single thread, not necessarily the main thread, use the single directive instead.
Example
// compile with: /openmp
#include <omp.h>
#include <stdio.h>
int main( )
{
int a[5], i;
#pragma omp parallel
{
// Perform some computation.
#pragma omp for
for (i = 0; i < 5; i++)
a[i] = i * i;
// Print intermediate results.
#pragma omp master
for (i = 0; i < 5; i++)
printf_s("a[%d] = %d\n", i, a[i]);
// Wait.
#pragma omp barrier
// Continue with the computation.
#pragma omp for
for (i = 0; i < 5; i++)
a[i] += i;
}
}
a[0] = 0
a[1] = 1
a[2] = 4
a[3] = 9
a[4] = 16
ordered
Specifies that code under a parallelized for
loop should be executed like a sequential loop.
#pragma omp ordered
structured-block
Remarks
The ordered
directive must be within the dynamic extent of a for or parallel for
construct with an ordered
clause.
The ordered
directive supports no clauses.
For more information, see 2.6.6 ordered construct.
Example
// omp_ordered.cpp
// compile with: /openmp
#include <stdio.h>
#include <omp.h>
static float a[1000], b[1000], c[1000];
void test(int first, int last)
{
#pragma omp for schedule(static) ordered
for (int i = first; i <= last; ++i) {
// Do something here.
if (i % 2)
{
#pragma omp ordered
printf_s("test() iteration %d\n", i);
}
}
}
void test2(int iter)
{
#pragma omp ordered
printf_s("test2() iteration %d\n", iter);
}
int main( )
{
int i;
#pragma omp parallel
{
test(1, 8);
#pragma omp for ordered
for (i = 0 ; i < 5 ; i++)
test2(i);
}
}
test() iteration 1
test() iteration 3
test() iteration 5
test() iteration 7
test2() iteration 0
test2() iteration 1
test2() iteration 2
test2() iteration 3
test2() iteration 4
parallel
Defines a parallel region, which is code that will be executed by multiple threads in parallel.
#pragma omp parallel [clauses]
{
code_block
}
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
Remarks
The parallel
directive supports the following clauses:
parallel
can also be used with the for and sections directives.
For more information, see 2.3 parallel construct.
Example
The following sample shows how to set the number of threads and define a parallel region. The number of threads is equal by default to the number of logical processors on the machine. For example, if you have a machine with one physical processor that has hyperthreading enabled, it will have two logical processors and two threads. The order of output can vary on different machines.
// omp_parallel.cpp
// compile with: /openmp
#include <stdio.h>
#include <omp.h>
int main() {
#pragma omp parallel num_threads(4)
{
int i = omp_get_thread_num();
printf_s("Hello from thread %d\n", i);
}
}
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 3
sections
Identifies code sections to be divided among all threads.
#pragma omp [parallel] sections [clauses]
{
#pragma omp section
{
code_block
}
}
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
Remarks
The sections
directive can contain zero or more section
directives.
The sections
directive supports the following clauses:
If parallel
is also specified, clauses
can be any clause accepted by the parallel
or sections
directives, except nowait
.
For more information, see 2.4.2 sections construct.
Example
// omp_sections.cpp
// compile with: /openmp
#include <stdio.h>
#include <omp.h>
int main() {
#pragma omp parallel sections num_threads(4)
{
printf_s("Hello from thread %d\n", omp_get_thread_num());
#pragma omp section
printf_s("Hello from thread %d\n", omp_get_thread_num());
}
}
Hello from thread 0
Hello from thread 0
single
Lets you specify that a section of code should be executed on a single thread, not necessarily the main thread.
#pragma omp single [clauses]
{
code_block
}
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
Remarks
The single
directive supports the following clauses:
For more information, see 2.4.3 single construct.
To specify that a section of code should only be executed on the main thread, use the master directive instead.
Example
// omp_single.cpp
// compile with: /openmp
#include <stdio.h>
#include <omp.h>
int main() {
#pragma omp parallel num_threads(2)
{
#pragma omp single
// Only a single thread can read the input.
printf_s("read input\n");
// Multiple threads in the team compute the results.
printf_s("compute results\n");
#pragma omp single
// Only a single thread can write the output.
printf_s("write output\n");
}
}
read input
compute results
compute results
write output
threadprivate
Specifies that a variable is private to a thread.
#pragma omp threadprivate(var)
Parameters
var
A comma-separated list of variables that you want to make private to a thread. var must be either a global- or namespace-scoped variable or a local static variable.
Remarks
The threadprivate
directive supports no clauses.
The threadprivate
directive is based on the thread attribute using the __declspec keyword; limits on __declspec(thread)
apply to threadprivate
. For example, a threadprivate
variable will exist in any thread started in the process, not just those threads that are part of a thread team spawned by a parallel region. Be aware of this implementation detail; you may notice that constructors for a threadprivate
user-defined type are called more often then expected.
You can use threadprivate
in a DLL that is statically loaded at process startup, however you can't use threadprivate
in any DLL that will be loaded via LoadLibrary such as DLLs that are loaded with /DELAYLOAD (delay load import), which also uses LoadLibrary
.
A threadprivate
variable of a destructible type isn't guaranteed to have its destructor called. For example:
struct MyType
{
~MyType();
};
MyType threaded_var;
#pragma omp threadprivate(threaded_var)
int main()
{
#pragma omp parallel
{}
}
Users have no control as to when the threads constituting the parallel region will terminate. If those threads exist when the process exits, the threads won't be notified about the process exit, and the destructor won't be called for threaded_var
on any thread except the one that exits (here, the primary thread). So code shouldn't count on proper destruction of threadprivate
variables.
For more information, see 2.7.1 threadprivate directive.
Example
For a sample of using threadprivate
, see private.