ovni/doc/user/runtime/distributed.md

42 lines
1.6 KiB
Markdown
Raw Permalink Normal View History

# Distributed traces (MPI)
The ovni trace is designed to support concurrent programs running in different
nodes in a cluster. It is often the case that the monotonic clock
(`CLOCK_MONOTONIC`) are not synchronized between machines (in general they
measure the time since boot).
To generate a coherent Paraver trace, the offsets of the clocks need to be
provided to the emulator too. To do so, run the `ovnisync` program using MPI on
2023-03-23 17:47:30 +01:00
the same nodes your workload will use.
!!! warning
Run only one MPI process of ovnisync per node.
If you are using SLURM, you may want to use something like:
% srun ./application
2023-03-23 17:47:30 +01:00
% srun --ntasks-per-node=1 ovnisync
!!! warning
Beware that you cannot launch two MPI programs inside the same srun session,
you must invoke srun twice.
By default, it will generate the `ovni/clock-offsets.txt` file, with the
relative offsets to the rank 0 of MPI. If the `OVNI_TRACEDIR` environment
variable is defined, the default file is `$OVNI_TRACEDIR/clock-offsets.txt`.
The emulator will automatically pick the offsets when processing the trace.
Use the ovnisync `-o` option to select a different output path (see the `-c`
option in ovniemu to load the file).
Here is an example table with three nodes, all units are in nanoseconds. The
standard deviation is less than 1 us:
```
rank hostname offset_median offset_mean offset_std
0 xeon01 0 0.000000 0.000000
1 xeon04 1165382584 1165382582.900000 135.286341
2 xeon05 3118113507 3118113599.070000 180.571610
```