2. Directives
Directives are based on #pragma
directives defined in the C and C++ standards. Compilers that support the OpenMP C and C++ API will include a command-line option that activates and allows interpretation of all OpenMP compiler directives.
2.1 Directive format
The syntax of an OpenMP directive is formally specified by the grammar in appendix C, and informally as follows:
#pragma omp directive-name [clause[ [,] clause]...] new-line
Each directive starts with #pragma omp
, to reduce the potential for conflict with other (non-OpenMP or vendor extensions to OpenMP) pragma directives with the same names. The rest of the directive follows the conventions of the C and C++ standards for compiler directives. In particular, white space can be used before and after the #
, and sometimes white space must be used to separate the words in a directive. Preprocessing tokens following the #pragma omp
are subject to macro replacement.
Directives are case-sensitive. The order in which clauses appear in directives isn't significant. Clauses on directives may be repeated as needed, subject to the restrictions listed in the description of each clause. If variable-list appears in a clause, it must specify only variables. Only one directive-name can be specified per directive. For example, the following directive isn't allowed:
/* ERROR - multiple directive names not allowed */
#pragma omp parallel barrier
An OpenMP directive applies to at most one succeeding statement, which must be a structured block.
2.2 Conditional compilation
The _OPENMP
macro name is defined by OpenMP-compliant implementations as the decimal constant yyyymm, which will be the year and month of the approved specification. This macro must not be the subject of a #define
or a #undef
preprocessing directive.
#ifdef _OPENMP
iam = omp_get_thread_num() + index;
#endif
If vendors define extensions to OpenMP, they may specify additional predefined macros.
2.3 parallel construct
The following directive defines a parallel region, which is a region of the program that's to be executed by many threads in parallel. This directive is the fundamental construct that starts parallel execution.
#pragma omp parallel [clause[ [, ]clause] ...] new-line structured-block
The clause is one of the following:
if(
scalar-expression)
private(
variable-list)
firstprivate(
variable-list)
default(shared | none)
shared(
variable-list)
copyin(
variable-list)
reduction(
operator:
variable-list)
num_threads(
integer-expression)
When a thread gets to a parallel construct, a team of threads is created if one of the following cases is true:
- No
if
clause is present. - The
if
expression evaluates to a nonzero value.
This thread becomes the master thread of the team, with a thread number of 0, and all threads in the team, including the master thread, execute the region in parallel. If the value of the if
expression is zero, the region is serialized.
To determine the number of threads that are requested, the following rules will be considered in order. The first rule whose condition is met will be applied:
If the
num_threads
clause is present, then the value of the integer expression is the number of threads requested.If the
omp_set_num_threads
library function has been called, then the value of the argument in the most recently executed call is the number of threads requested.If the environment variable
OMP_NUM_THREADS
is defined, then the value of this environment variable is the number of threads requested.If none of the methods above is used, then the number of threads requested is implementation-defined.
If the num_threads
clause is present then it supersedes the number of threads requested by the omp_set_num_threads
library function or the OMP_NUM_THREADS
environment variable only for the parallel region it's applied to. Later parallel regions aren't affected by it.
The number of threads that execute the parallel region also depends upon whether dynamic adjustment of the number of threads is enabled. If dynamic adjustment is disabled, then the requested number of threads will execute the parallel region. If dynamic adjustment is enabled then the requested number of threads is the maximum number of threads that may execute the parallel region.
If a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads requested for the parallel region is more than the number that the run-time system can supply, the behavior of the program is implementation-defined. An implementation may, for example, interrupt the execution of the program, or it may serialize the parallel region.
The omp_set_dynamic
library function and the OMP_DYNAMIC
environment variable can be used to enable and disable dynamic adjustment of the number of threads.
The number of physical processors actually hosting the threads at any given time is implementation-defined. Once created, the number of threads in the team stays constant for the duration of that parallel region. It can be changed either explicitly by the user or automatically by the run-time system from one parallel region to another.
The statements contained within the dynamic extent of the parallel region are executed by each thread, and each thread can execute a path of statements that's different from the other threads. Directives encountered outside the lexical extent of a parallel region are referred to as orphaned directives.
There's an implied barrier at the end of a parallel region. Only the master thread of the team continues execution at the end of a parallel region.
If a thread in a team executing a parallel region encounters another parallel construct, it creates a new team, and it becomes the master of that new team. Nested parallel regions are serialized by default. As a result, by default, a nested parallel region is executed by a team composed of one thread. The default behavior may be changed by using either the runtime library function omp_set_nested
or the environment variable OMP_NESTED
. However, the number of threads in a team that execute a nested parallel region is implementation-defined.
Restrictions to the parallel
directive are as follows:
At most, one
if
clause can appear on the directive.It's unspecified whether any side effects inside the if expression or
num_threads
expression occur.A
throw
executed inside a parallel region must cause execution to resume within the dynamic extent of the same structured block, and it must be caught by the same thread that threw the exception.Only a single
num_threads
clause can appear on the directive. Thenum_threads
expression is evaluated outside the context of the parallel region, and must evaluate to a positive integer value.The order of evaluation of the
if
andnum_threads
clauses is unspecified.
Cross-references
private
,firstprivate
,default
,shared
,copyin
, andreduction
clauses (section 2.7.2)- OMP_NUM_THREADS environment variable
- omp_set_dynamic library function
- OMP_DYNAMIC environment variable
- omp_set_nested function
- OMP_NESTED environment variable
- omp_set_num_threads library function
2.4 Work-sharing constructs
A work-sharing construct distributes the execution of the associated statement among the members of the team that encounter it. The work-sharing directives don't launch new threads, and there's no implied barrier on entry to a work-sharing construct.
The sequence of work-sharing constructs and barrier
directives encountered must be the same for every thread in a team.
OpenMP defines the following work-sharing constructs, and these constructs are described in the sections that follow:
2.4.1 for construct
The for
directive identifies an iterative work-sharing construct that specifies that the iterations of the associated loop will be executed in parallel. The iterations of the for
loop are distributed across threads that already exist in the team executing the parallel construct to which it binds. The syntax of the for
construct is as follows:
#pragma omp for [clause[[,] clause] ... ] new-line for-loop
The clause is one of the following:
private(
variable-list)
firstprivate(
variable-list)
lastprivate(
variable-list)
reduction(
operator:
variable-list)
ordered
schedule(
kind [,
chunk_size])
nowait
The for
directive places restrictions on the structure of the corresponding for
loop. Specifically, the corresponding for
loop must have canonical shape:
for (
init-expr ;
var logical-op b ;
incr-expr )
init-expr
One of the following:
- var = lb
- integer-type var = lb
incr-expr
One of the following:
++
var- var
++
--
var- var
--
- var
+=
incr - var
-=
incr - var
=
var+
incr - var
=
incr+
var - var
=
var-
incr
var
A signed integer variable. If this variable would otherwise be shared, it's implicitly made private for the duration of the for
. Do not modify this variable within the body of the for
statement. Unless the variable is specified lastprivate
, its value after the loop is indeterminate.
logical-op
One of the following:
<
<=
>
>=
lb, b, and incr
Loop invariant integer expressions. There's no synchronization during the evaluation of these expressions, so any evaluated side effects produce indeterminate results.
The canonical form allows the number of loop iterations to be computed on entry to the loop. This computation is made with values in the type of var, after integral promotions. In particular, if value of b -
lb +
incr can't be represented in that type, the result is indeterminate. Further, if logical-op is <
or <=
, then incr-expr must cause var to increase on each iteration of the loop. If logical-op is >
or >=
, then incr-expr must cause var to get smaller on each iteration of the loop.
The schedule
clause specifies how iterations of the for
loop are divided among threads of the team. The correctness of a program must not depend on which thread executes a particular iteration. The value of chunk_size, if specified, must be a loop invariant integer expression with a positive value. There's no synchronization during the evaluation of this expression, so any evaluated side effects produce indeterminate results. The schedule kind can be one of the following values:
Table 2-1: schedule
clause kind values
Value | Description |
---|---|
static | When schedule(static, chunk_size ) is specified, iterations are divided into chunks of a size specified by chunk_size. The chunks are statically assigned to threads in the team in a round-robin fashion in the order of the thread number. When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, with one chunk assigned to each thread. |
dynamic | When schedule(dynamic, chunk_size ) is specified, the iterations are divided into a series of chunks, each containing chunk_size iterations. Each chunk is assigned to a thread that's waiting for an assignment. The thread executes the chunk of iterations and then waits for its next assignment, until no chunks remain to be assigned. The last chunk to be assigned may have a smaller number of iterations. When no chunk_size is specified, it defaults to 1. |
guided | When schedule(guided, chunk_size ) is specified, the iterations are assigned to threads in chunks with decreasing sizes. When a thread finishes its assigned chunk of iterations, it's dynamically assigned another chunk, until none is left. For a chunk_size of 1, the size of each chunk is approximately the number of unassigned iterations divided by the number of threads. These sizes decrease almost exponentially to 1. For a chunk_size with value k greater than 1, the sizes decrease almost exponentially to k, except that the last chunk may have fewer than k iterations. When no chunk_size is specified, it defaults to 1. |
runtime | When schedule(runtime) is specified, the decision regarding scheduling is deferred until runtime. The schedule kind and size of the chunks can be chosen at run time by setting the environment variable OMP_SCHEDULE . If this environment variable isn't set, the resulting schedule is implementation-defined. When schedule(runtime) is specified, chunk_size must not be specified. |
In the absence of an explicitly defined schedule
clause, the default schedule
is implementation-defined.
An OpenMP-compliant program shouldn't rely on a particular schedule for correct execution. A program shouldn't rely on a schedule kind conforming precisely to the description given above, because it's possible to have variations in the implementations of the same schedule kind across different compilers. The descriptions can be used to select the schedule that's appropriate for a particular situation.
The ordered
clause must be present when ordered
directives bind to the for
construct.
There's an implicit barrier at the end of a for
construct unless a nowait
clause is specified.
Restrictions to the for
directive are as follows:
The
for
loop must be a structured block, and, in addition, its execution must not be terminated by abreak
statement.The values of the loop control expressions of the
for
loop associated with afor
directive must be the same for all the threads in the team.The
for
loop iteration variable must have a signed integer type.Only a single
schedule
clause can appear on afor
directive.Only a single
ordered
clause can appear on afor
directive.Only a single
nowait
clause can appear on afor
directive.It's unspecified if or how often any side effects within the chunk_size, lb, b, or incr expressions occur.
The value of the chunk_size expression must be the same for all threads in the team.
Cross-references
private
,firstprivate
,lastprivate
, andreduction
clauses (section 2.7.2)- OMP_SCHEDULE environment variable
- ordered construct
- schedule clause
2.4.2 sections construct
The sections
directive identifies a noniterative work-sharing construct that specifies a set of constructs that are to be divided among threads in a team. Each section is executed once by a thread in the team. The syntax of the sections
directive is as follows:
#pragma omp sections [clause[[,] clause] ...] new-line
{
[#pragma omp section new-line]
structured-block
[#pragma omp section new-linestructured-block ]
...
}
The clause is one of the following:
private(
variable-list)
firstprivate(
variable-list)
lastprivate(
variable-list)
reduction(
operator:
variable-list)
nowait
Each section is preceded by a section
directive, although the section
directive is optional for the first section. The section
directives must appear within the lexical extent of the sections
directive. There's an implicit barrier at the end of a sections
construct, unless a nowait
is specified.
Restrictions to the sections
directive are as follows:
A
section
directive must not appear outside the lexical extent of thesections
directive.Only a single
nowait
clause can appear on asections
directive.
Cross-references
private
,firstprivate
,lastprivate
, andreduction
clauses (section 2.7.2)
2.4.3 single construct
The single
directive identifies a construct that specifies that the associated structured block is executed by only one thread in the team (not necessarily the master thread). The syntax of the single
directive is as follows:
#pragma omp single [clause[[,] clause] ...] new-linestructured-block
The clause is one of the following:
private(
variable-list)
firstprivate(
variable-list)
copyprivate(
variable-list)
nowait
There's an implicit barrier after the single
construct unless a nowait
clause is specified.
Restrictions to the single
directive are as follows:
- Only a single
nowait
clause can appear on asingle
directive. - The
copyprivate
clause must not be used with thenowait
clause.
Cross-references
private
,firstprivate
, andcopyprivate
clauses (section 2.7.2)
2.5 Combined parallel work-sharing constructs
Combined parallel work-sharing constructs are shortcuts for specifying a parallel region that has only one work-sharing construct. The semantics of these directives are the same as explicitly specifying a parallel
directive followed by a single work-sharing construct.
The following sections describe the combined parallel work-sharing constructs:
- parallel for directive
- parallel sections directive
2.5.1 parallel for construct
The parallel for
directive is a shortcut for a parallel
region that contains only a single for
directive. The syntax of the parallel for
directive is as follows:
#pragma omp parallel for [clause[[,] clause] ...] new-linefor-loop
This directive allows all the clauses of the parallel
directive and the for
directive, except the nowait
clause, with identical meanings and restrictions. The semantics are the same as explicitly specifying a parallel
directive immediately followed by a for
directive.
Cross-references
- parallel directive
- for directive
- Data attribute clauses
2.5.2 parallel sections construct
The parallel sections
directive provides a shortcut form for specifying a parallel
region that has only a single sections
directive. The semantics are the same as explicitly specifying a parallel
directive immediately followed by a sections
directive. The syntax of the parallel sections
directive is as follows:
#pragma omp parallel sections [clause[[,] clause] ...] new-line
{
[#pragma omp section new-line]
structured-block
[#pragma omp section new-linestructured-block ]
...
}
The clause can be one of the clauses accepted by the parallel
and sections
directives, except the nowait
clause.
Cross-references
2.6 Master and synchronization directives
The following sections describe:
- master construct
- critical construct
- barrier directive
- atomic construct
- flush directive
- ordered construct
2.6.1 master construct
The master
directive identifies a construct that specifies a structured block that's executed by the master thread of the team. The syntax of the master
directive is as follows:
#pragma omp master new-linestructured-block
Other threads in the team don't execute the associated structured block. There's no implied barrier either on entry to or exit from the master construct.
2.6.2 critical construct
The critical
directive identifies a construct that restricts execution of the associated structured block to a single thread at a time. The syntax of the critical
directive is as follows:
#pragma omp critical [(name)] new-linestructured-block
An optional name may be used to identify the critical region. Identifiers used to identify a critical region have external linkage and are in a name space that is separate from the name spaces used by labels, tags, members, and ordinary identifiers.
A thread waits at the beginning of a critical region until no other thread is executing a critical region (anywhere in the program) with the same name. All unnamed critical
directives map to the same unspecified name.
2.6.3 barrier directive
The barrier
directive synchronizes all the threads in a team. When encountered, each thread in the team waits until all of the others have reached this point. The syntax of the barrier
directive is as follows:
#pragma omp barrier new-line
After all threads in the team have encountered the barrier, each thread in the team begins executing the statements after the barrier directive in parallel. Because the barrier
directive doesn't have a C language statement as part of its syntax, there are some restrictions on its placement within a program. For more information about the formal grammar, see appendix C. The example below illustrates these restrictions.
/* ERROR - The barrier directive cannot be the immediate
* substatement of an if statement
*/
if (x!=0)
#pragma omp barrier
...
/* OK - The barrier directive is enclosed in a
* compound statement.
*/
if (x!=0) {
#pragma omp barrier
}
2.6.4 atomic construct
The atomic
directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads. The syntax of the atomic
directive is as follows:
#pragma omp atomic new-lineexpression-stmt
The expression statement must have one of the following forms:
- x binop
=
expr - x
++
++
x- x
--
--
x
In the preceding expressions:
x is an lvalue expression with scalar type.
expr is an expression with scalar type, and it doesn't reference the object designated by x.
binop isn't an overloaded operator and is one of
+
,*
,-
,/
,&
,^
,|
,<<
, or>>
.
Although it's implementation-defined whether an implementation replaces all atomic
directives with critical
directives that have the same unique name, the atomic
directive permits better optimization. Often hardware instructions are available that can perform the atomic update with the least overhead.
Only the load and store of the object designated by x are atomic; the evaluation of expr isn't atomic. To avoid race conditions, all updates of the location in parallel should be protected with the atomic
directive, except those that are known to be free of race conditions.
Restrictions to the atomic
directive are as follows:
- All atomic references to the storage location x throughout the program are required to have a compatible type.
Examples
extern float a[], *p = a, b;
/* Protect against races among multiple updates. */
#pragma omp atomic
a[index[i]] += b;
/* Protect against races with updates through a. */
#pragma omp atomic
p[i] -= 1.0f;
extern union {int n; float x;} u;
/* ERROR - References through incompatible types. */
#pragma omp atomic
u.n++;
#pragma omp atomic
u.x -= 1.0f;
2.6.5 flush directive
The flush
directive, whether explicit or implied, specifies a "cross-thread" sequence point at which the implementation is required to ensure that all threads in a team have a consistent view of certain objects (specified below) in memory. This means that previous evaluations of expressions that reference those objects are complete and subsequent evaluations haven't yet begun. For example, compilers must restore the values of the objects from registers to memory, and hardware may need to flush write buffers to memory and reload the values of the objects from memory.
The syntax of the flush
directive is as follows:
#pragma omp flush [(variable-list)] new-line
If the objects that require synchronization can all be designated by variables, then those variables can be specified in the optional variable-list. If a pointer is present in the variable-list, the pointer itself is flushed, not the object the pointer refers to.
A flush
directive without a variable-list synchronizes all shared objects except inaccessible objects with automatic storage duration. (This is likely to have more overhead than a flush
with a variable-list.) A flush
directive without a variable-list is implied for the following directives:
barrier
- At entry to and exit from
critical
- At entry to and exit from
ordered
- At entry to and exit from
parallel
- At exit from
for
- At exit from
sections
- At exit from
single
- At entry to and exit from
parallel for
- At entry to and exit from
parallel sections
The directive isn't implied if a nowait
clause is present. It should be noted that the flush
directive isn't implied for any of the following:
- At entry to
for
- At entry to or exit from
master
- At entry to
sections
- At entry to
single
A reference that accesses the value of an object with a volatile-qualified type behaves as if there were a flush
directive specifying that object at the previous sequence point. A reference that modifies the value of an object with a volatile-qualified type behaves as if there were a flush
directive specifying that object at the subsequent sequence point.
Because the flush
directive doesn't have a C language statement as part of its syntax, there are some restrictions on its placement within a program. For more information about the formal grammar, see appendix C. The example below illustrates these restrictions.
/* ERROR - The flush directive cannot be the immediate
* substatement of an if statement.
*/
if (x!=0)
#pragma omp flush (x)
...
/* OK - The flush directive is enclosed in a
* compound statement
*/
if (x!=0) {
#pragma omp flush (x)
}
Restrictions to the flush
directive are as follows:
- A variable specified in a
flush
directive must not have a reference type.
2.6.6 ordered construct
The structured block following an ordered
directive is executed in the order in which iterations would be executed in a sequential loop. The syntax of the ordered
directive is as follows:
#pragma omp ordered new-linestructured-block
An ordered
directive must be within the dynamic extent of a for
or parallel for
construct. The for
or parallel for
directive to which the ordered
construct binds must have an ordered
clause specified as described in section 2.4.1. In the execution of a for
or parallel for
construct with an ordered
clause, ordered
constructs are executed strictly in the order in which they would be executed in a sequential execution of the loop.
Restrictions to the ordered
directive are as follows:
- An iteration of a loop with a
for
construct must not execute the same ordered directive more than once, and it must not execute more than oneordered
directive.
2.7 Data environment
This section presents a directive and several clauses for controlling the data environment during the execution of parallel regions, as follows:
A threadprivate directive is provided to make file- scope, namespace-scope, or static block-scope variables local to a thread.
Clauses that may be specified on the directives to control the sharing attributes of variables for the duration of the parallel or work-sharing constructs are described in section 2.7.2.
2.7.1 threadprivate directive
The threadprivate
directive makes the named file-scope, namespace-scope, or static block-scope variables specified in the variable-list private to a thread. variable-list is a comma-separated list of variables that don't have an incomplete type. The syntax of the threadprivate
directive is as follows:
#pragma omp threadprivate(variable-list) new-line
Each copy of a threadprivate
variable is initialized once, at an unspecified point in the program prior to the first reference to that copy, and in the usual manner (i.e., as the master copy would be initialized in a serial execution of the program). Note that if an object is referenced in an explicit initializer of a threadprivate
variable, and the value of the object is modified prior to the first reference to a copy of the variable, then the behavior is unspecified.
As with any private variable, a thread must not reference another thread's copy of a threadprivate
object. During serial regions and master regions of the program, references will be to the master thread's copy of the object.
After the first parallel region executes, the data in the threadprivate
objects is guaranteed to persist only if the dynamic threads mechanism has been disabled and if the number of threads remains unchanged for all parallel regions.
The restrictions to the threadprivate
directive are as follows:
A
threadprivate
directive for file-scope or namespace-scope variables must appear outside any definition or declaration, and must lexically precede all references to any of the variables in its list.Each variable in the variable-list of a
threadprivate
directive at file or namespace scope must refer to a variable declaration at file or namespace scope that lexically precedes the directive.A
threadprivate
directive for static block-scope variables must appear in the scope of the variable and not in a nested scope. The directive must lexically precede all references to any of the variables in its list.Each variable in the variable-list of a
threadprivate
directive in block scope must refer to a variable declaration in the same scope that lexically precedes the directive. The variable declaration must use the static storage-class specifier.If a variable is specified in a
threadprivate
directive in one translation unit, it must be specified in athreadprivate
directive in every translation unit in which it's declared.A
threadprivate
variable must not appear in any clause except thecopyin
,copyprivate
,schedule
,num_threads
, or theif
clause.The address of a
threadprivate
variable isn't an address constant.A
threadprivate
variable must not have an incomplete type or a reference type.A
threadprivate
variable with non-POD class type must have an accessible, unambiguous copy constructor if it's declared with an explicit initializer.
The following example illustrates how modifying a variable that appears in an initializer can cause unspecified behavior, and also how to avoid this problem by using an auxiliary object and a copy-constructor.
int x = 1;
T a(x);
const T b_aux(x); /* Capture value of x = 1 */
T b(b_aux);
#pragma omp threadprivate(a, b)
void f(int n) {
x++;
#pragma omp parallel for
/* In each thread:
* Object a is constructed from x (with value 1 or 2?)
* Object b is copy-constructed from b_aux
*/
for (int i=0; i<n; i++) {
g(a, b); /* Value of a is unspecified. */
}
}
Cross-references
- dynamic threads
- OMP_DYNAMIC environment variable
2.7.2 Data-sharing attribute clauses
Several directives accept clauses that allow a user to control the sharing attributes of variables for the duration of the region. Sharing attribute clauses apply only to variables in the lexical extent of the directive on which the clause appears. Not all of the following clauses are allowed on all directives. The list of clauses that are valid on a particular directive are described with the directive.
If a variable is visible when a parallel or work-sharing construct is encountered, and the variable isn't specified in a sharing attribute clause or threadprivate
directive, then the variable is shared. Static variables declared within the dynamic extent of a parallel region are shared. Heap allocated memory (for example, using malloc()
in C or C++ or the new
operator in C++) is shared. (The pointer to this memory, however, can be either private or shared.) Variables with automatic storage duration declared within the dynamic extent of a parallel region are private.
Most of the clauses accept a variable-list argument, which is a comma-separated list of variables that are visible. If a variable referenced in a data-sharing attribute clause has a type derived from a template, and there are no other references to that variable in the program, the behavior is undefined.
All variables that appear within directive clauses must be visible. Clauses may be repeated as needed, but no variable may be specified in more than one clause, except that a variable can be specified in both a firstprivate
and a lastprivate
clause.
The following sections describe the data-sharing attribute clauses:
2.7.2.1 private
The private
clause declares the variables in variable-list to be private to each thread in a team. The syntax of the private
clause is as follows:
private(variable-list)
The behavior of a variable specified in a private
clause is as follows. A new object with automatic storage duration is allocated for the construct. The size and alignment of the new object are determined by the type of the variable. This allocation occurs once for each thread in the team, and a default constructor is invoked for a class object if necessary; otherwise the initial value is indeterminate. The original object referenced by the variable has an indeterminate value upon entry to the construct, must not be modified within the dynamic extent of the construct, and has an indeterminate value upon exit from the construct.
In the lexical extent of the directive construct, the variable references the new private object allocated by the thread.
The restrictions to the private
clause are as follows:
A variable with a class type that's specified in a
private
clause must have an accessible, unambiguous default constructor.A variable specified in a
private
clause must not have aconst
-qualified type unless it has a class type with amutable
member.A variable specified in a
private
clause must not have an incomplete type or a reference type.Variables that appear in the
reduction
clause of aparallel
directive can't be specified in aprivate
clause on a work-sharing directive that binds to the parallel construct.
2.7.2.2 firstprivate
The firstprivate
clause provides a superset of the functionality provided by the private
clause. The syntax of the firstprivate
clause is as follows:
firstprivate(variable-list)
Variables specified in variable-list have private
clause semantics, as described in section 2.7.2.1. The initialization or construction happens as if it were done once per thread, prior to the thread's execution of the construct. For a firstprivate
clause on a parallel construct, the initial value of the new private object is the value of the original object that exists immediately prior to the parallel construct for the thread that encounters it. For a firstprivate
clause on a work-sharing construct, the initial value of the new private object for each thread that executes the work-sharing construct is the value of the original object that exists prior to the point in time that the same thread encounters the work-sharing construct. In addition, for C++ objects, the new private object for each thread is copy constructed from the original object.
The restrictions to the firstprivate
clause are as follows:
A variable specified in a
firstprivate
clause must not have an incomplete type or a reference type.A variable with a class type that's specified as
firstprivate
must have an accessible, unambiguous copy constructor.Variables that are private within a parallel region or that appear in the
reduction
clause of aparallel
directive can't be specified in afirstprivate
clause on a work-sharing directive that binds to the parallel construct.
2.7.2.3 lastprivate
The lastprivate
clause provides a superset of the functionality provided by the private
clause. The syntax of the lastprivate
clause is as follows:
lastprivate(variable-list)
Variables specified in the variable-list have private
clause semantics. When a lastprivate
clause appears on the directive that identifies a work-sharing construct, the value of each lastprivate
variable from the sequentially last iteration of the associated loop, or the lexically last section directive, is assigned to the variable's original object. Variables that aren't assigned a value by the last iteration of the for
or parallel for
, or by the lexically last section of the sections
or parallel sections
directive, have indeterminate values after the construct. Unassigned subobjects also have an indeterminate value after the construct.
The restrictions to the lastprivate
clause are as follows:
All restrictions for
private
apply.A variable with a class type that's specified as
lastprivate
must have an accessible, unambiguous copy assignment operator.Variables that are private within a parallel region or that appear in the
reduction
clause of aparallel
directive can't be specified in alastprivate
clause on a work-sharing directive that binds to the parallel construct.
2.7.2.4 shared
This clause shares variables that appear in the variable-list among all the threads in a team. All threads within a team access the same storage area for shared
variables.
The syntax of the shared
clause is as follows:
shared(variable-list)
2.7.2.5 default
The default
clause allows the user to affect the data-sharing attributes of variables. The syntax of the default
clause is as follows:
default(shared | none)
Specifying default(shared)
is equivalent to explicitly listing each currently visible variable in a shared
clause, unless it's threadprivate
or const
-qualified. In the absence of an explicit default
clause, the default behavior is the same as if default(shared)
were specified.
Specifying default(none)
requires that at least one of the following must be true for every reference to a variable in the lexical extent of the parallel construct:
The variable is explicitly listed in a data-sharing attribute clause of a construct that contains the reference.
The variable is declared within the parallel construct.
The variable is
threadprivate
.The variable has a
const
-qualified type.The variable is the loop control variable for a
for
loop that immediately follows afor
orparallel for
directive, and the variable reference appears inside the loop.
Specifying a variable on a firstprivate
, lastprivate
, or reduction
clause of an enclosed directive causes an implicit reference to the variable in the enclosing context. Such implicit references are also subject to the requirements listed above.
Only a single default
clause may be specified on a parallel
directive.
A variable's default data-sharing attribute can be overridden by using the private
, firstprivate
, lastprivate
, reduction
, and shared
clauses, as demonstrated by the following example:
#pragma omp parallel for default(shared) firstprivate(i)\
private(x) private(r) lastprivate(i)
2.7.2.6 reduction
This clause performs a reduction on the scalar variables that appear in variable-list, with the operator op. The syntax of the reduction
clause is as follows:
reduction(
op :
variable-list )
A reduction is typically specified for a statement with one of the following forms:
- x
=
x op expr - x binop
=
expr - x
=
expr op x (except for subtraction) - x
++
++
x- x
--
--
x
where:
x
One of the reduction variables specified in the list.
variable-list
A comma-separated list of scalar reduction variables.
expr
An expression with scalar type that doesn't reference x.
op
Not an overloaded operator but one of +
, *
, -
, &
, ^
, |
, &&
, or ||
.
binop
Not an overloaded operator but one of +
, *
, -
, &
, ^
, or |
.
The following is an example of the reduction
clause:
#pragma omp parallel for reduction(+: a, y) reduction(||: am)
for (i=0; i<n; i++) {
a += b[i];
y = sum(y, c[i]);
am = am || b[i] == c[i];
}
As shown in the example, an operator may be hidden inside a function call. The user should be careful that the operator specified in the reduction
clause matches the reduction operation.
Although the right operand of the ||
operator has no side effects in this example, they're permitted, but should be used with care. In this context, a side effect that's guaranteed not to occur during sequential execution of the loop may occur during parallel execution. This difference can occur because the order of execution of the iterations is indeterminate.
The operator is used to determine the initial value of any private variables used by the compiler for the reduction and to determine the finalization operator. Specifying the operator explicitly allows the reduction statement to be outside the lexical extent of the construct. Any number of reduction
clauses may be specified on the directive, but a variable may appear in at most one reduction
clause for that directive.
A private copy of each variable in variable-list is created, one for each thread, as if the private
clause had been used. The private copy is initialized according to the operator (see the following table).
At the end of the region for which the reduction
clause was specified, the original object is updated to reflect the result of combining its original value with the final value of each of the private copies using the operator specified. The reduction operators are all associative (except for subtraction), and the compiler may freely reassociate the computation of the final value. (The partial results of a subtraction reduction are added to form the final value.)
The value of the original object becomes indeterminate when the first thread reaches the containing clause and remains so until the reduction computation is complete. Normally, the computation will be complete at the end of the construct; however, if the reduction
clause is used on a construct to which nowait
is also applied, the value of the original object remains indeterminate until a barrier synchronization has been performed to ensure that all threads have completed the reduction
clause.
The following table lists the operators that are valid and their canonical initialization values. The actual initialization value will be consistent with the data type of the reduction variable.
Operator | Initialization |
---|---|
+ |
0 |
* |
1 |
- |
0 |
& |
~0 |
| |
0 |
^ |
0 |
&& |
1 |
|| |
0 |
The restrictions to the reduction
clause are as follows:
The type of the variables in the
reduction
clause must be valid for the reduction operator except that pointer types and reference types are never permitted.A variable that's specified in the
reduction
clause must not beconst
-qualified.Variables that are private within a parallel region or that appear in the
reduction
clause of aparallel
directive can't be specified in areduction
clause on a work-sharing directive that binds to the parallel construct.#pragma omp parallel private(y) { /* ERROR - private variable y cannot be specified in a reduction clause */ #pragma omp for reduction(+: y) for (i=0; i<n; i++) y += b[i]; } /* ERROR - variable x cannot be specified in both a shared and a reduction clause */ #pragma omp parallel for shared(x) reduction(+: x)
2.7.2.7 copyin
The copyin
clause provides a mechanism to assign the same value to threadprivate
variables for each thread in the team executing the parallel region. For each variable specified in a copyin
clause, the value of the variable in the master thread of the team is copied, as if by assignment, to the thread-private copies at the beginning of the parallel region. The syntax of the copyin
clause is as follows:
copyin(
variable-list
)
The restrictions to the copyin
clause are as follows:
A variable that's specified in the
copyin
clause must have an accessible, unambiguous copy assignment operator.A variable that's specified in the
copyin
clause must be athreadprivate
variable.
2.7.2.8 copyprivate
The copyprivate
clause provides a mechanism to use a private variable to broadcast a value from one member of a team to the other members. It's an alternative to using a shared variable for the value when providing such a shared variable would be difficult (for example, in a recursion requiring a different variable at each level). The copyprivate
clause can only appear on the single
directive.
The syntax of the copyprivate
clause is as follows:
copyprivate(
variable-list
)
The effect of the copyprivate
clause on the variables in its variable-list occurs after the execution of the structured block associated with the single
construct, and before any of the threads in the team have left the barrier at the end of the construct. Then, in all other threads in the team, for each variable in the variable-list, that variable becomes defined (as if by assignment) with the value of the corresponding variable in the thread that executed the construct's structured block.
Restrictions to the copyprivate
clause are as follows:
A variable that's specified in the
copyprivate
clause must not appear in aprivate
orfirstprivate
clause for the samesingle
directive.If a
single
directive with acopyprivate
clause is encountered in the dynamic extent of a parallel region, all variables specified in thecopyprivate
clause must be private in the enclosing context.A variable that's specified in the
copyprivate
clause must have an accessible unambiguous copy assignment operator.
2.8 Directive binding
Dynamic binding of directives must adhere to the following rules:
The
for
,sections
,single
,master
, andbarrier
directives bind to the dynamically enclosingparallel
, if one exists, regardless of the value of anyif
clause that may be present on that directive. If no parallel region is currently being executed, the directives are executed by a team composed of only the master thread.The
ordered
directive binds to the dynamically enclosingfor
.The
atomic
directive enforces exclusive access with respect toatomic
directives in all threads, not just the current team.The
critical
directive enforces exclusive access with respect tocritical
directives in all threads, not just the current team.A directive can never bind to any directive outside the closest dynamically enclosing
parallel
.
2.9 Directive nesting
Dynamic nesting of directives must adhere to the following rules:
A
parallel
directive dynamically inside anotherparallel
logically establishes a new team, which is composed of only the current thread, unless nested parallelism is enabled.for
,sections
, andsingle
directives that bind to the sameparallel
aren't allowed to be nested inside each other.critical
directives with the same name aren't allowed to be nested inside each other. Note that this restriction isn't sufficient to prevent deadlock.for
,sections
, andsingle
directives aren't permitted in the dynamic extent ofcritical
,ordered
, andmaster
regions if the directives bind to the sameparallel
as the regions.barrier
directives aren't permitted in the dynamic extent offor
,ordered
,sections
,single
,master
, andcritical
regions if the directives bind to the sameparallel
as the regions.master
directives aren't permitted in the dynamic extent offor
,sections
, andsingle
directives if themaster
directives bind to the sameparallel
as the work-sharing directives.ordered
directives aren't allowed in the dynamic extent ofcritical
regions if the directives bind to the sameparallel
as the regions.Any directive that's permitted when executed dynamically inside a parallel region is also permitted when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed by a team composed of only the master thread.