136 lines
5.3 KiB
Markdown
136 lines
5.3 KiB
Markdown
# nOS-V model
|
|
|
|
The [nOS-V library][nosv] implements a user space runtime that can schedule
|
|
tasks to run in multiple CPUs. The nOS-V library is instrumented to
|
|
track the internal state of the runtime as well as emit information
|
|
about the tasks that are running.
|
|
|
|
[nosv]: https://github.com/bsc-pm/nos-v
|
|
|
|
## Task model
|
|
|
|
The nOS-V runtime is composed of tasks that can be scheduled to run in
|
|
threads. Tasks can be paused and resumed, leaving the CPUs free to
|
|
execute other tasks.
|
|
|
|
In nOS-V, parallel tasks can also be scheduled multiple times and the
|
|
same task may run concurrently in several CPUs. To model this scenario,
|
|
we introduce the concept of *body*, which maps to each execution of the
|
|
same task, with a unique body id.
|
|
|
|

|
|
|
|
A normal task only has one body, while a parallel task (created with
|
|
`TASK_FLAG_PARALLEL`) can have more than one body. Each body holds the
|
|
execution state, and can transition to different execution states
|
|
following this state diagram:
|
|
|
|

|
|
|
|
Bodies begin in the Created state and transition to Running when they
|
|
begin the execution. Bodies that can be paused (created with the flag
|
|
`BODY_FLAG_PAUSE` can transition to the Paused state.
|
|
|
|
Additionally, bodies can run multiple times if they are created with the
|
|
`BODY_FLAG_RESURRECT`, and transition from Dead to Running. This
|
|
transition is required to model the tasks that implement the taskiter in
|
|
NODES, which will be submitted multiple times for execution reusing the
|
|
same task id and body id. Every time a body runs again, the iteration
|
|
number is increased.
|
|
|
|
## Task type colors
|
|
|
|
In the Paraver timeline, the color assigned to each nOS-V task type is
|
|
computed from the task type label using a hash function; the task type
|
|
id doesn't affect in any way how the color gets assigned. This method
|
|
provides two desirable properties:
|
|
|
|
- Invariant type colors over time: the order in which task types are
|
|
created doesn't affect their color.
|
|
|
|
- Deterministic colors among threads: task types with the same label end
|
|
up mapped to the same color, even if they are from different threads
|
|
located in different nodes.
|
|
|
|
For more details, see [this MR][1].
|
|
|
|
[1]: https://pm.bsc.es/gitlab/rarias/ovni/-/merge_requests/27
|
|
|
|
## Subsystem view
|
|
|
|
The subsystem view provides a simplified view on what is the nOS-V
|
|
runtime doing over time. The view follows the same rules described in
|
|
the [subsystem view of Nanos6](nanos6.md/#subsystem_view).
|
|
|
|
|
|
## Idle view
|
|
|
|
The idle view shows the progress state of the running threads:
|
|
*Progressing* and *Resting*. The *Progressing* state is shown when they
|
|
are making useful progress and the *Resting* state when they are waiting
|
|
for work. When workers start running, by definition, they begin in the
|
|
Progressing state and there are some situations that make them
|
|
transition to Resting:
|
|
|
|
- When workers are waiting in the delegation lock after some spins or
|
|
when instructed to go to sleep.
|
|
- When the server is trying to serve tasks, but there are no more tasks
|
|
available.
|
|
|
|
They will go back to Progressing as soon as they receive work. The
|
|
specific points at which they do so can be read in [nOS-V source
|
|
code](https://gitlab.bsc.es/nos-v/nos-v) by looking at the
|
|
`instr_worker_resting()` and `instr_worker_progressing()` trace points.
|
|
|
|
This view is intended to detect parts of the execution time on which the
|
|
workers don't have work, typically because the application doesn't have
|
|
enough parallelism or the scheduler is unable to serve work fast enough.
|
|
|
|
## Breakdown view
|
|
|
|
The breakdown view displays a summary of what is happening in all CPUs
|
|
by mixing in a single timeline the subsystem, idle and task type views.
|
|
Specifically, it shows how many CPUs are resting as defined by the idle
|
|
view, how many are inside a given task by showing the task type label,
|
|
and how many are in a particular subsystem of the runtime.
|
|
|
|
!!! Important
|
|
|
|
You must specify *ovni.level = 3* or higher in *nosv.toml* and pass
|
|
the *-b* option to ovniemu to generate the breakdown view.
|
|
|
|
Notice that the vertical axis shows the **number**
|
|
of CPUs in that state, not the physical CPUs like other views.
|
|
Here is an example of the Heat mini-app:
|
|
|
|

|
|
|
|
## Hardware counters (HWC) view
|
|
|
|
The hardware counter view allows you to see the *delta* of a given set of
|
|
hardware counters over time. The counters are read at the beginning and end of
|
|
tasks as well as at some nOS-V API methods.
|
|
|
|
To enable support for HWC in nOS-V use at least level 2 in `ovni.level` or
|
|
enable the "hwc" event set in nosv.toml. Then, make sure the
|
|
`hwcounters.backend` option is set to "papi" and select the counters you want to
|
|
enable in `hwcounters.papi_events`. Here is an example to trace total
|
|
instructions and cycles:
|
|
|
|
```
|
|
instrumentation.version = "ovni"
|
|
ovni.level = 2
|
|
hwcounters.backend = "papi"
|
|
hwcounters.papi_events = [ "PAPI_TOT_INS", "PAPI_TOT_CYC" ]
|
|
```
|
|
|
|
You can use the `papi_avail` tool to see which counters are available for a
|
|
particular machine and a description of each counter. Each CPU has a limit in
|
|
how many counters can be enabled at the same time, reported in the *Number
|
|
Hardware Counters* line.
|
|
|
|
The events for HWC are generated in cpu.prv and thread.prv for CPUs and threads,
|
|
respectively. For each enabled hardware counter, a new configuration file will
|
|
be created at `cfg/cpu/nosv/hwc-*.cfg` and `cfg/thread/nosv/hwc-*.cfg` with the
|
|
corresponding name of the counter.
|