165 lines
6.9 KiB
Markdown
165 lines
6.9 KiB
Markdown
|
# OpenMP Model
|
||
|
|
||
|
The LLVM OpenMP Runtime is an integral component of the LLVM compiler
|
||
|
infrastructure that provides support for the OpenMP (Open Multi-Processing)
|
||
|
programming model.
|
||
|
|
||
|
OpenMP is a widely used API and set of directives for parallel programming,
|
||
|
allowing developers to write multi-threaded and multi-process applications more
|
||
|
easily.
|
||
|
|
||
|
This documentation is about an OpenMP runtime built on top of [nOS-V][nosv],
|
||
|
leveraging its thread management capabilities while retaining the fundamental
|
||
|
characteristics of the original runtime.
|
||
|
|
||
|
While the modifications introduced to the runtime may appear to be minor, it's
|
||
|
important to note that this enhanced version is not API compatible with the
|
||
|
original runtime. As a result, it is mandatory to use the clang built in the same
|
||
|
[LLVM Project][llvm].
|
||
|
|
||
|
This document describes all the instrumentation features included in the runtime
|
||
|
by both nOS-V and OpenMP to monitor task execution and the execution flow within
|
||
|
the runtime library to identify what is happening. This data is useful for both
|
||
|
users and developers of the OpenMP runtime to analyze issues and undesired
|
||
|
behaviors.
|
||
|
|
||
|
[llvm]: https://pm.bsc.es/llvm-ompss
|
||
|
[nosv]: https://gitlab.bsc.es/nos-v/nos-v
|
||
|
|
||
|
## How to Generate Execution Traces
|
||
|
|
||
|
In order to build the OpenMP runtime nOS-V must be provided by using
|
||
|
`PKG_CONFIG_PATH` environment variable when configuring CMake. This results in a
|
||
|
runtime without instrumentation. However, the user may be able to generate
|
||
|
execution traces by enabling nOS-V instrumentation through
|
||
|
`NOSV_CONFIG_OVERRIDE="instrumentation.version=ovni"`. Note that this needs a
|
||
|
nOS-V installation built with ovni.
|
||
|
|
||
|
Building OpenMP with instrumentation requires to pass ovni pkg-config path to
|
||
|
`PKG_CONFIG_PATH` with a nosv installation compiled with ovni too. The reason is
|
||
|
because OpenMP is dependent of nOS-V to generate complete execution traces.
|
||
|
|
||
|
By default, OpenMP will not instrument anything. To enable instrumentation the
|
||
|
user must execute with `OMP_OVNI=1` and `NOSV_CONFIG_OVERRIDE="instrumentation.version=ovni"`.
|
||
|
|
||
|
The following sections will describe the OpenMP execution trace views and what
|
||
|
information is shown there.
|
||
|
|
||
|
## nOS-V Task Type
|
||
|
|
||
|
As said in the previous sections. This OpenMP runtime is built on top of nOS-V.
|
||
|
So the user can explore what does the execution do there. Here we only describe
|
||
|
the task type view. For other views please take a look at the nOS-V chapter.
|
||
|
|
||
|
In OpenMP, every thread that is launched (main thread included) is shown in a task
|
||
|
type with label "openmp". In a task application, every task call will be seen with
|
||
|
a task type with label "file:line:col" format referring to the pragma location. This
|
||
|
can be changed by using the clause label(string-literal).
|
||
|
|
||
|
OpenMP task if0 will not be shown here. Take a look at the section "Limitations" for
|
||
|
more information. Nevertheless, the OpenMP task view shows it.
|
||
|
|
||
|
## OpenMP Subsystem
|
||
|
|
||
|
This view illustrates the activities of each thread with different states:
|
||
|
|
||
|
- **Attached**: The thread is attached.
|
||
|
|
||
|
- **Join barrier**: The thread is in the implicit barrier of the parallel region.
|
||
|
|
||
|
- **Tasking barrier**: The thread is in the additional tasking barrier trying to
|
||
|
execute tasks. This event happens if executed with KMP_TASKING=1.
|
||
|
|
||
|
- **Spin wait**: The thread spin waits for a condition. Usually this event happens
|
||
|
in a barrier while waiting for the other threads to reach the barrier. The thread
|
||
|
also tries to execute tasks.
|
||
|
|
||
|
- **For static**: Executing a for static. The length of the event represents all the
|
||
|
chunks of iterations executed by the thread. See "Limitations" section.
|
||
|
|
||
|
- **For dynamic init**: Running the initialization of an OpenMP for dynamic.
|
||
|
|
||
|
- **For dynamic chunk**: Running a chunk of iterations of an OpenMP for dynamic. To
|
||
|
clarify. If a thread executes two chunks of iterations, let's say from 1 to 4 and
|
||
|
from 8 to 12, two different events will be shown. See "Limitations" section.
|
||
|
|
||
|
- **Single**: Running a Single region. All threads of the parallel region will emit
|
||
|
the event.
|
||
|
|
||
|
- **Release deps**: When finishing a task, trying to release dependencies. This
|
||
|
event happens although the task has no dependencies.
|
||
|
|
||
|
- **Taskwait deps**: Trying to execute tasks until dependencies have been fulfilled.
|
||
|
This appears typically in a task if0 with dependencies or a taskwait with deps.
|
||
|
|
||
|
- **Invoke task**: Executing a task.
|
||
|
|
||
|
- **Invoke task if0**: Executing a task if0.
|
||
|
|
||
|
- **Task alloc**: Allocating the task descriptor.
|
||
|
|
||
|
- **Task schedule**: Adding the task to the scheduler.
|
||
|
|
||
|
- **Taskwait**: Running a taskwait.
|
||
|
|
||
|
- **Taskyield**: Running a taskyield.
|
||
|
|
||
|
- **Task dup alloc**: Duplicating the task descriptor in a taskloop.
|
||
|
|
||
|
- **Check deps**: Checking if the task has pending dependencies to be fulfilled. This
|
||
|
means that if all dependencies are fulfilled the task will be scheduled.
|
||
|
|
||
|
- **Taskgroup**: Running a taskgroup.
|
||
|
|
||
|
## Limitations
|
||
|
|
||
|
By the way how OpenMP is implemented. There are some instrumentation points that
|
||
|
violate ovni subsystem rules. This mostly happens because some directives are lowered
|
||
|
partially in the transformed user code, so it is not easy to wrap them into a
|
||
|
Single-entry single-exit (SESE) region, like we would do with a regular task invocation,
|
||
|
for example.
|
||
|
|
||
|
All problematic directives are described here so the user is able to understand what
|
||
|
is being show in the traces
|
||
|
|
||
|
- **Task if0**: The lowered user code of a task if0 is:
|
||
|
... = __kmpc_omp_task_alloc(...);
|
||
|
__kmpc_omp_taskwait_deps_51(...); // If task has dependencies
|
||
|
__kmpc_omp_task_begin_if0(...);
|
||
|
// Call to the user code
|
||
|
omp_task_entry_(...);
|
||
|
__kmpc_omp_task_complete_if0(...);
|
||
|
|
||
|
Ideally, `omp_task_entry` should be called by the runtime to ensure the SESE structure. As
|
||
|
this code is generated by the compiler it is assumed that instrumenting `__kmpc_omp_task_begin_if0`
|
||
|
and `__kmpc_omp_task_complete_if0` as entry/exit points is safe and equivalent.
|
||
|
|
||
|
- **For static**: The lowered user code of a for static is:
|
||
|
// Parallel code
|
||
|
__kmpc_for_static_init_4(...);
|
||
|
for ( i = ...; i <= ...; ++i )
|
||
|
;
|
||
|
__kmpc_for_static_fini(...);
|
||
|
|
||
|
Ideally, the for loop should be called by the runtime to ensure the SESE structure. As
|
||
|
this code is generated by the compiler it is assumed that instrumenting `__kmpc_for_static_init_4`
|
||
|
and `__kmpc_for_static_fini` as entry/exit points is safe and equivalent.
|
||
|
|
||
|
- **For dynamic**: The lowered user code of a for dynamic is:
|
||
|
|
||
|
__kmpc_dispatch_init_4(...);
|
||
|
while ( __kmpc_dispatch_next_4(...))
|
||
|
{
|
||
|
for ( i = ...; i <= ...; ++i )
|
||
|
;
|
||
|
}
|
||
|
|
||
|
Ideally, the for loop should be called by the runtime to ensure the SESE structure. As
|
||
|
this code is generated by the compiler the subsystem view shows:
|
||
|
1. How long it takes to run `__kmpc_dispatch_init_4` with the event **For dynamic init**
|
||
|
2. How long it takes to run from the end of 1. to the first `__kmpc_dispatch_next_4`.
|
||
|
with the event **For dynamic chunk**.
|
||
|
3. How long it takes to run a loop iteration chunk between the last and the previous
|
||
|
`__kmpc_dispatch_next_4` call with the event **For dynamic chunk**.
|
||
|
|