OpenMP

An API for Writing Multithreaded Applications

A set of compiler directives and library routines for parallel application programmers

Common API

int omp_get_num_threads(); //Number of threads in the team
int omp_get_thread_num(); //Thread ID or rank
double omp_get_wtime();//Time in Seconds since a fixed point in the past

OpenMP core syntax

Most of the constructs in OpenMP are compiler directives.

#pragma omp construct [clause [clause] ]  
//Example
#pragma omp parallel num_threads(4)

Function prototypes and types in the file:
```
#include <omp.h>
```
Most OpenMP* constructs apply to a structured block.
- Structured block: a block of one or more statements with one point of entry at the top and one point of exit at the bottom.
- It’s OK to have an exit() within the structured block.

How do threads interact?

OpenMP is a multi-threading, shared address model.
- Threads communicate by sharing variables.
- Unintended sharing of data causes race conditions:

race condition:

when the program’s outcome changes as the threads are scheduled differently.

To control race conditions:
- Use synchronization to protect data conflicts.
- Synchronization is expensive so: Change how data is accessed to minimize the need for synchronization.

Thread Creation: Parallel Regions

You create threads in OpenMP* with the parallel construct.
Each thread executes the same code redundantly.

Synchronization

Synchronization is used to impose order constraints and to protect access to shared data

High level synchronization:
- critical
- atomic
- barrier
- ordered
Low level synchronization
- flush
- locks (both simple and nested)

Synchronization: critical

Mutual exclusion: Only one thread at a time can enter a critical region.

float res;
#pragma omp parallel{
 float B; int i, id, nthrds;
 id = omp_get_thread_num();
 nthrds = omp_get_num_threads();

 for (i = id; i < niters; i+nthrds) {
     B = big_job(i);
     #pragma omp critical
         consume (B, res);
 }
}

Synchronization: Atomic

Atomic provides mutual exclusion but only applies to the update of a memory location

Synchronization: Barrier

Barrier: Each thread waits until all threads arrive.

Synchronization: ordered

The ordered region executes in the sequential order.

Split up pathways through the code between threads within a team is called work sharing Methods:

Loop construct

Sections/section constructs

Single construct

Task construct

Basic approach

Find compute intensive loops
Make the loop iterations independent .. So they can safely execute in any order without loop-carried dependencies
Place the appropriate OpenMP directive and test
Note: loop index “i” is private by default

The master construct denotes a structured block that is only executed by the master thread.
The other threads just skip it (no synchronization is implied).
#pragma omp parallel {
do_many_things();
#pragma omp master
  { exchange_boundaries(); }
#pragma omp barrier
do_many_other_things(); }
Work Sharing with Single Construct
The single construct denotes a block of code that is executed by only one thread (not necessarily the master thread).
A barrier is implied at the end of the single block (can remove the barrier with a nowait clause).
#pragma omp parallel {
do_many_things();
#pragma omp single
{ exchange_boundaries(); }
do_many_other_things();
}
Reduction
OpenMP reduction clause: reduction (op : list)
We are combining values into a single accumulation variable (ave) ... there is a true dependence between loop iterations that can’t be trivially removed

Support for reduction operations is included in most parallel programming environments.

Inside a parallel or a work-sharing construct:

A local copy of each list variable is made and initialized depending on the op (e.g. 0 for “+”).

Compiler finds standard reduction expressions containing op and uses them to update the local copy.

Local copies are reduced into a single value and combined with the original global value.

The variables in “list” must be shared in the enclosing parallel region.
Reduction operands/initial-values

Data environment: Default storage attributes

Shared Memory programming model:
- Most variables are shared by default
Global variables are SHARED among threads
- C: File scope variables, static. dynamically allocated memory (ALLOCATE, malloc, new)
But not everything is shared...
- functions(C) called from parallel regions are PRIVATE
- Automatic variables within a statement block are PRIVATE.

Changing storage attributes One can selectively change storage attributes for constructs using the following clauses*

SHARED
PRIVATE: private(var) creates a new local copy of var for each thread.
FIRSTPRIVATE: Initializes each private copy with the corresponding value from the master thread.
LASTPRIVATE: Lastprivate passes the value of a private from the last iteration to a global variable.
All the clauses on this page All the clauses on this page apply to the OpenMP construct apply to the OpenMP construct, NOT to the entire region.

the default storage attribute is DEFAULT(SHARED) (so no need to use it) Exception: #pragma omp task
DEFAULT(NONE): no default for variables in static extent. Must list storage attribute for each variable in static extent. Good programming practice!

Sections worksharing Construct

The Sections worksharing construct gives a different structured block to each thread.

Loop worksharing constructs: The schedule clause

schedule(static [,chunk])
- Deal-out blocks of iterations of size “chunk” to each thread.
schedule(dynamic[,chunk])
- Each thread grabs “chunk” iterations off a queue until all iterations have been handled.
schedule(guided[,chunk])
- Threads dynamically grab blocks of iterations.
- The size of the block starts large and shrinks down to size “chunk” as the calculation proceeds.
schedule(runtime)
- Schedule and chunk size taken from the OMP_SCHEDULE environment variable (or the runtime library for OpenMP 3.0)

OpenMP

OpenMP

Common API

OpenMP core syntax

How do threads interact?

race condition:

Thread Creation: Parallel Regions

Synchronization

Synchronization: critical

Synchronization: Atomic

Synchronization: Barrier

Synchronization: ordered

Reduction

Reduction operands/initial-values

Data environment: Default storage attributes

Sections worksharing Construct

Loop worksharing constructs: The schedule clause

results matching ""

No results matching ""

OpenMP

Common API

OpenMP core syntax

How do threads interact?

race condition:

Thread Creation: Parallel Regions

Synchronization

Synchronization: critical

Synchronization: Atomic

Synchronization: Barrier

Synchronization: ordered

Work sharing

Work Sharing with Loops

Work Sharing with Master Construct

Work Sharing with Single Construct

Reduction

Reduction operands/initial-values

Data environment: Default storage attributes

Data Sharing: Default Clause

Sections worksharing Construct

Loop worksharing constructs: The schedule clause

results matching ""

No results matching ""