ovni/doc/user/emulation/nanos6.md

241 lines
9.3 KiB
Markdown
Raw Normal View History

# Nanos6 model
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
The Nanos6 runtime library implements the OmpSs-2 tasking model, which
schedules the execution of tasks with dependencies. For more information
see the [OmpSs-2 website][oss] and the [Nanos6 repository][nanos6].
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
[oss]: https://pm.bsc.es/ompss-2
[nanos6]: https://github.com/bsc-pm/nanos6
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
The library is instrumented to track the execution of tasks and also the
execution path inside the runtime library to identify what is happening.
This information is typically used by both the users and the developers
of the Nanos6 library to analyze problems and unwanted behaviors.
Towards this goal, five different Paraver views are generated, which are
2022-08-31 11:31:55 +02:00
explained below.
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
The state of each task is modelled in a simple finite state machine,
which identifies the main state changes of the task. The task is set to
the *Running* state when it begins executing the body of the task,
2022-08-31 11:31:55 +02:00
consisting of user defined code. The states can be observed in the
following diagram:
![Nanos6 task states](fig/nanos6-task-model.svg)
A thread can only execute one task at a time, but multiple tasks can be
nested and only the topmost one will actually run. When a view presents
the running task of a thread, that one is always the top of the stack.
2022-08-31 11:31:55 +02:00
## Task ID view
The task ID view represents the numeric ID of the Nanos6 task that is
currently running on each thread. The ID is a monotonically increasing
identifier assigned on task creation. Lower IDs correspond to tasks
created at an earlier point than higher IDs.
2022-08-31 11:31:55 +02:00
## Task type view
2022-08-29 13:41:28 +02:00
Every task in Nanos6 contains a task type, which roughly corresponds to
the actual location in the code a task was declared. For example if a
2022-08-31 11:31:55 +02:00
function is declared as a Nanos6 task, and it is called multiple times
in a program, every created task will have a different ID, but the same
type.
2022-08-29 13:41:28 +02:00
In the view, each type is shown with a label declared in the source with
the label attribute of the running task. If no label was specified, one
is automatically generated for each type.
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
Note that in this view, the numeric event value is a hash function of
the type label, so two distinct types (tasks declared in different parts
of the code) with the same label will share the event value and have the
same color.
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
## MPI rank view
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
This view shows the numeric MPI rank of the process running the current
task. It is only shown when the task is in the running state. This view
is specially useful to identify task in a distributed workload, which
spans several nodes.
2022-08-29 13:41:28 +02:00
As the zero value in Paraver gets hidden, we use the rank+1 value
instead. Therefore the rank numeric value go from 1 to the number of
ranks (inclusive).
## Thread type view
This view shows the type of each thread:
- Main: the first thread executed before the main() function even
begins.
- Leader: helps with the execution of the main().
- Worker: the most common threads, these are in charge of queueing
and running tasks.
- External: used for external threads that attach to Nanos6 (currently
there are not in use).
2023-02-28 10:08:30 +01:00
## Idle view
2023-03-27 11:47:46 +02:00
The idle view shows when CPUs become *Idle*. This state is displayed when no
thread is *Running* in the CPU or when a worker is marked as Stalled (not making
progress) such as when busy waiting but still *Running* in the CPU. The CPUs can
also be marked as *Absorbing noise*, when they are running a worker in the sponge
mode, which leaves the CPU free to absorb noise from system processes and
interruptions to avoid disruption in the other CPUs.
2023-03-27 11:47:46 +02:00
In particular, a worker requesting a new task will become Stalled after entering
the delegation lock and performing a complete iteration without receiving work.
2022-08-29 13:41:28 +02:00
## Subsystem view
2022-08-31 11:31:55 +02:00
The subsystem view attempts to provide a general overview of what Nanos6
is doing at any point in time. This view is more complex to understand
than the others but is generally the most useful to understand what is
happening and debug problems related with Nanos6 itself.
2022-08-29 13:41:28 +02:00
The view shows the state of the runtime for each thread (and for each
CPU, the state of the running thread in that CPU).
The state is computed by the following method: the runtime code is
2022-08-31 11:31:55 +02:00
completely divided into sections of code (machine instructions) $`S_1,
S_2, \ldots, S_N`$, which are instrumented (an event is emitted when entering
and exiting each section), and one common section of code which is
shared across the subsystems, $`U`$, of no interest. We also assume any
other code not belonging to the runtime belongs to the $`U`$ section.
2022-08-29 13:41:28 +02:00
!!! remark
Every instruction of the runtime belongs to *exactly one section*.
2022-08-29 13:41:28 +02:00
To determine the state of a thread, we look into the stack to see what
is the top-most instrumented section.
At any given point in time, a thread may be executing code with a stack
2022-08-31 11:31:55 +02:00
that spawns multiple sections, for example $`[ S_1, U, S_2, S_3, U ]`$
(the last is current stack frame). The subsystem view selects the last
subsystem section from the stack ignoring the common section $`U`$, and
presents that section as the current state of the execution, in this
case the section $`S_3`$.
2022-08-29 13:41:28 +02:00
2022-08-31 11:31:55 +02:00
Additionally, the runtime sections $`S_i`$ are grouped together in
2022-09-05 20:17:56 +02:00
subsystems, which form a closely related group of functions. When there is no
instrumented section in the thread stack, the state is set to **No subsystem**.
The complete list of subsystems and sections is shown below.
- **Task subsystem**: Controls the life cycle of tasks
- **Running body**: Executing the body of the task (user defined code).
2022-09-05 20:17:56 +02:00
- **Running task for**: Running the body of the task in a task for (one of
the collaborators).
2022-09-05 20:17:56 +02:00
- **Spawning function**: Spawning a function as task that will be submitted
for later execution.
- **Creating**: Creating a new task via `nanos6_create_task`
- **Submitting**: Submitting a recently created task via
`nanos6_submit_task`
- **Scheduler subsystem**: Queueing and dequeueing ready tasks.
- **Serving tasks**: Inside the scheduler lock, serving tasks to
other workers.
2022-09-05 20:17:56 +02:00
- **Adding ready tasks**: Adding tasks to the scheduler queues, but
outside of the scheduler lock.
- **Processing ready tasks**: Moving tasks from the lock-free add
queues to the scheduler queues.
2022-09-05 20:17:56 +02:00
- **Worker subsystem**: Actions that relate to worker threads, which
continuously try to execute new tasks.
- **Looking for work**: Actively requesting tasks from the scheduler,
registered but not holding the lock.
- **Handling task**: Processing a recently assigned task.
- **Switching to another thread**: Switching to another thread via
switchTo().
- **Migrating CPU**: Changing the CPU of the current worker.
- **Suspending thread**: The current thread is in the suspend()
function.
- **Resuming another thread**: Inside the resume() function,
wakening another thread.
- **Memory**: Manages the allocation and deallocation of memory.
- **Allocating**: Allocating new memory. This can take very long
with jemalloc in the first calls from each thread.
- **Freeing**: Deallocating memory.
2022-09-05 20:17:56 +02:00
- **Dependency subsystem**: Manages the registration of task
dependencies.
- **Registering**: Registering dependencies of a task
- **Unregistering**: Releasing dependencies of a task because
it has ended
- **Blocking subsystem**: Code that stops the thread execution.
- **Taskwait**: Task is blocked due to a `taskwait` clause
- **Blocking current task**: Task is blocked through the Nanos6
blocking API
- **Unblocking remote task**: Unblocking a different task using the
Nanos6 blocking API
- **Wait for deadline**: Blocking a deadline task, which will be
re-enqueued when a certain amount of time has passed
!!! remark
Notice that tasks in the running state will display the "Task: Running
body" subsystem only when the task has not called any other instrumented
subsystem in Nanos6. Tasks will continue to be in the running state until
paused or finished.
2023-03-27 11:47:33 +02:00
## Breakdown view (experimental)
The breakdown view shows the **number** of CPUs in a given state, stacking
similar states together. Its a mix of the task type, subsystem and idle views.
!!! Important
This view is only generated with the `-b` option of ovniemu. This
feature is experimental, new changes may break compatibility.
Here is an example of the Heat mini-app.
![Breakdown example](fig/breakdown.png)
The *Idle* state shows when a CPU has no threads running or when the worker is
Stalled (is not making progress). Below the Idle state the tasks are shown by
their label, in this case only *block computation* is running. Above the Idle
state, the runtime subsystems are shown.
This view gives a quick overview of the usage of the CPUs over time and which
tasks are taking more time. It also shows which sections are unable to use all
the CPUs, showing potential problems with the parallelization. In this example
not all the CPUs can be filled with *block computation* tasks.
## Limitations
The task for clause is partially supported, as currently the emulator uses a
simplified tasking model where a task can only be executed by one thread, and
only once. However, the Nanos6 runtime can run the body of a single task
multiple times by varying the arguments with the task for clause, which breaks
the emulation model.
The task for is currently only shown in the subsystem view, but it doesn't
2023-02-14 16:38:43 +01:00
appear as running any task in the other views.