Remove NOISE file
This commit is contained in:
parent
4533c94b4f
commit
f9c832654e
147
NOISE
147
NOISE
@ -1,147 +0,0 @@
|
|||||||
|
|
||||||
Known sources of noise
|
|
||||||
in MareNostrum 4
|
|
||||||
|
|
||||||
|
|
||||||
ABSTRACT
|
|
||||||
|
|
||||||
The experiments run at MareNostrum 4 show that there are several
|
|
||||||
factors that can affect the execution time. Some may even become the
|
|
||||||
dominant part of the time, rendering the experiment invalid.
|
|
||||||
|
|
||||||
This document lists all known sources of variability and tries to give
|
|
||||||
an overview on how to detect and correct the problems.
|
|
||||||
|
|
||||||
1. Notable sources of variability
|
|
||||||
|
|
||||||
Usually all sources were found in the MareNostrum 4 cluster, but they
|
|
||||||
may apply to other machines. Some may have a detection mechanism so
|
|
||||||
the effect can be neglected, but others don't. Also, some problems
|
|
||||||
only occur with low probability.
|
|
||||||
|
|
||||||
Other sources of variability with a low effect, say lower than 1% of
|
|
||||||
the mean time, are not listed here.
|
|
||||||
|
|
||||||
1.1 The daemon slurmstepd eats sys CPU in a new thread
|
|
||||||
|
|
||||||
For a period of about 10 seconds a thread is created from the
|
|
||||||
slurmstepd process when a job is running, which uses quite a lot of
|
|
||||||
CPU. This event happens from time to time with unknown frequency. It
|
|
||||||
was first observed in the nbody program, as it almost doubles the time
|
|
||||||
per iteration, as the other processes are waiting for the one with
|
|
||||||
slow CPU to continue to the next iteration. The SLURM version was
|
|
||||||
17.11.7 and the program was executed with sbatch+srun. See the issue
|
|
||||||
for more details:
|
|
||||||
|
|
||||||
https://pm.bsc.es/gitlab/rarias/bsc-nixpkgs/-/issues/19
|
|
||||||
|
|
||||||
It can be detected by looking at the cycles per us view with Extrae,
|
|
||||||
with the PAPI counters enabled. It shows a slowdown in one process
|
|
||||||
when the problem occurs. Also, perf-sched(1) can be used to trace
|
|
||||||
context switches to other programs but requires access to the debugfs.
|
|
||||||
|
|
||||||
1.2 MPICH uses ethernet rather than infiniband
|
|
||||||
|
|
||||||
Some MPI implementations (like MPICH) can silently use non-optimal
|
|
||||||
fabrics like the ethernet rather than infiniband because the are
|
|
||||||
misconfigured.
|
|
||||||
|
|
||||||
Can be detected by running latency benchmarks like the OSU micro
|
|
||||||
benchmark, which should report a low latency. It can also be reported
|
|
||||||
by using strace to ensure which network card is being used.
|
|
||||||
|
|
||||||
1.3 CPU binding
|
|
||||||
|
|
||||||
A thread may switch between CPUs when running, leading to a drop in
|
|
||||||
performance. To ensure that it remains in the same process it can be
|
|
||||||
binded with srun(1) or sbatch(1) using the --cpu-bind option, or using
|
|
||||||
taskset(1).
|
|
||||||
|
|
||||||
It can be detected by running the program with Extrae and using the
|
|
||||||
General/view/executing_cpu.cfg configuration in Paraver. After
|
|
||||||
adjusting the scale, all processes must have a different color from
|
|
||||||
each other (the assigned CPU) and keep it constant. Otherwise changes
|
|
||||||
of CPUs are happening.
|
|
||||||
|
|
||||||
1.4 Libraries that use dlopen(3)
|
|
||||||
|
|
||||||
Some libraries or programs try to determine which components are
|
|
||||||
available in a system by looking for specific libraries in the search
|
|
||||||
path determined at runtime.
|
|
||||||
|
|
||||||
This behavior can cause a program to change the execution time
|
|
||||||
depending on the environment variables like LD_LIBRARY_PATH.
|
|
||||||
|
|
||||||
It can be detected by setting LD_DEBUG=all (see ld.so(8)) or using
|
|
||||||
strace(1) when running the program.
|
|
||||||
|
|
||||||
1.5 Intel MPI library selection
|
|
||||||
|
|
||||||
The Intel MPI library has several variants which are loaded at run
|
|
||||||
time: debug, release, debug_mt and release_mt. Of which the
|
|
||||||
I_MPI_THREAD_SPLIT controls whether the multithread capabilities are
|
|
||||||
enabled or not.
|
|
||||||
|
|
||||||
1.6 LLVM and OpenMP problem
|
|
||||||
|
|
||||||
The LLVM OpenMP implementation is installed in libomp.so, however two
|
|
||||||
symbolic links are created for libgomp.so and libiomp5.so.
|
|
||||||
|
|
||||||
libgomp.so -> libomp.so
|
|
||||||
libiomp5.so -> libomp.so
|
|
||||||
libomp.so
|
|
||||||
|
|
||||||
So applications compiled with OpenMP by other compilers may end up
|
|
||||||
using the LLVM implementation. This can be observed by setting
|
|
||||||
LD_DEBUG=all of using strace(1) and looking for the libomp.so library
|
|
||||||
being loaded.
|
|
||||||
|
|
||||||
In bscpkgs the symbolic links have been removed for the clangOmpss2
|
|
||||||
compiler.
|
|
||||||
|
|
||||||
1.7 Nix-shell does not allow isolation
|
|
||||||
|
|
||||||
Nix-shell is not isolated, the compilation process tries then to
|
|
||||||
use headers and libs from /usr.
|
|
||||||
|
|
||||||
This can induce compilation errors not happening inside nix-build.
|
|
||||||
Do not use to ensure reproducibility.
|
|
||||||
|
|
||||||
1.8 Make doesn't rebuild objects
|
|
||||||
|
|
||||||
When using local repo as src code, (e.g. developer mode on) a make
|
|
||||||
clean at the preBuild stage is required.
|
|
||||||
|
|
||||||
Nix sets the same modification date (one second after the Epoch
|
|
||||||
(1970-01-01 at 00:00:01 in UTC timezone) to all the files in the nix
|
|
||||||
store (also those copied from repos). Makefile checks the files
|
|
||||||
modification date in order to call or not the compilation
|
|
||||||
instructions. If any object/binary file exists out of Nix, at the time
|
|
||||||
we build within Nix, they will be copied with the current data and
|
|
||||||
consequently not updated during the Nix compilation process.
|
|
||||||
|
|
||||||
1.9 Sbatch silently fails on parsing
|
|
||||||
|
|
||||||
When submitting a job with a wrong specification in MN4 with SLURM
|
|
||||||
17.11.9-2, for example this bogus line:
|
|
||||||
|
|
||||||
#SBATCH --nodes=1 2
|
|
||||||
|
|
||||||
It silently fails to parse the options, falling back to the defaults,
|
|
||||||
without any error.
|
|
||||||
|
|
||||||
We have improved our checking to detect bogus options passed to SLURM,
|
|
||||||
so we prevent this problem from happening.
|
|
||||||
|
|
||||||
1.10 The srun program misses signals after MPI_Finalize
|
|
||||||
|
|
||||||
When a program receives a signal such as SIGSEGV after calling
|
|
||||||
MPI_Finalize, srun at version 17.11.7 doesn't return a error code but
|
|
||||||
exits with 0.
|
|
||||||
|
|
||||||
This can cause bogus programs to go undetected when only checking the
|
|
||||||
return code of srun. A better approach is to check the exit code with
|
|
||||||
sacct(1) or write the exit code to a file and check it later.
|
|
||||||
|
|
||||||
/* vim: set ts=2 sw=2 tw=72 fo=watqc expandtab spell autoindent: */
|
|
||||||
|
|
Reference in New Issue
Block a user