This commit introduces the OVNI_TRACEDIR environment variable to change the directory where traces are generated. By default, when the envar is not defined, the trace is still generated in the ovni directory. The envar can take a trace directory name, a relative path to the directory, or its absolute path. In the first case, the directory is created in the current path $PWD. Both libovni (rt) and ovnisync read this environment variable.
1.6 KiB
Distributed traces (MPI)
The ovni trace is designed to support concurrent programs running in different
nodes in a cluster. It is often the case that the monotonic clock
(CLOCK_MONOTONIC
) are not synchronized between machines (in general they
measure the time since boot).
To generate a coherent Paraver trace, the offsets of the clocks need to be
provided to the emulator too. To do so, run the ovnisync
program using MPI on
the same nodes your workload will use.
!!! warning
Run only one MPI process of ovnisync per node.
If you are using SLURM, you may want to use something like:
% srun ./application
% srun --ntasks-per-node=1 ovnisync
!!! warning
Beware that you cannot launch two MPI programs inside the same srun session,
you must invoke srun twice.
By default, it will generate the ovni/clock-offsets.txt
file, with the
relative offsets to the rank 0 of MPI. If the OVNI_TRACEDIR
environment
variable is defined, the default file is $OVNI_TRACEDIR/clock-offsets.txt
.
The emulator will automatically pick the offsets when processing the trace.
Use the ovnisync -o
option to select a different output path (see the -c
option in ovniemu to load the file).
Here is an example table with three nodes, all units are in nanoseconds. The standard deviation is less than 1 us:
rank hostname offset_median offset_mean offset_std
0 xeon01 0 0.000000 0.000000
1 xeon04 1165382584 1165382582.900000 135.286341
2 xeon05 3118113507 3118113599.070000 180.571610