diff --git a/NOISE b/NOISE deleted file mode 100644 index 7278a08..0000000 --- a/NOISE +++ /dev/null @@ -1,147 +0,0 @@ - - Known sources of noise - in MareNostrum 4 - - -ABSTRACT - - The experiments run at MareNostrum 4 show that there are several - factors that can affect the execution time. Some may even become the - dominant part of the time, rendering the experiment invalid. - - This document lists all known sources of variability and tries to give - an overview on how to detect and correct the problems. - -1. Notable sources of variability - - Usually all sources were found in the MareNostrum 4 cluster, but they - may apply to other machines. Some may have a detection mechanism so - the effect can be neglected, but others don't. Also, some problems - only occur with low probability. - - Other sources of variability with a low effect, say lower than 1% of - the mean time, are not listed here. - -1.1 The daemon slurmstepd eats sys CPU in a new thread - - For a period of about 10 seconds a thread is created from the - slurmstepd process when a job is running, which uses quite a lot of - CPU. This event happens from time to time with unknown frequency. It - was first observed in the nbody program, as it almost doubles the time - per iteration, as the other processes are waiting for the one with - slow CPU to continue to the next iteration. The SLURM version was - 17.11.7 and the program was executed with sbatch+srun. See the issue - for more details: - - https://pm.bsc.es/gitlab/rarias/bsc-nixpkgs/-/issues/19 - - It can be detected by looking at the cycles per us view with Extrae, - with the PAPI counters enabled. It shows a slowdown in one process - when the problem occurs. Also, perf-sched(1) can be used to trace - context switches to other programs but requires access to the debugfs. - -1.2 MPICH uses ethernet rather than infiniband - - Some MPI implementations (like MPICH) can silently use non-optimal - fabrics like the ethernet rather than infiniband because the are - misconfigured. - - Can be detected by running latency benchmarks like the OSU micro - benchmark, which should report a low latency. It can also be reported - by using strace to ensure which network card is being used. - -1.3 CPU binding - - A thread may switch between CPUs when running, leading to a drop in - performance. To ensure that it remains in the same process it can be - binded with srun(1) or sbatch(1) using the --cpu-bind option, or using - taskset(1). - - It can be detected by running the program with Extrae and using the - General/view/executing_cpu.cfg configuration in Paraver. After - adjusting the scale, all processes must have a different color from - each other (the assigned CPU) and keep it constant. Otherwise changes - of CPUs are happening. - -1.4 Libraries that use dlopen(3) - - Some libraries or programs try to determine which components are - available in a system by looking for specific libraries in the search - path determined at runtime. - - This behavior can cause a program to change the execution time - depending on the environment variables like LD_LIBRARY_PATH. - - It can be detected by setting LD_DEBUG=all (see ld.so(8)) or using - strace(1) when running the program. - -1.5 Intel MPI library selection - - The Intel MPI library has several variants which are loaded at run - time: debug, release, debug_mt and release_mt. Of which the - I_MPI_THREAD_SPLIT controls whether the multithread capabilities are - enabled or not. - -1.6 LLVM and OpenMP problem - - The LLVM OpenMP implementation is installed in libomp.so, however two - symbolic links are created for libgomp.so and libiomp5.so. - - libgomp.so -> libomp.so - libiomp5.so -> libomp.so - libomp.so - - So applications compiled with OpenMP by other compilers may end up - using the LLVM implementation. This can be observed by setting - LD_DEBUG=all of using strace(1) and looking for the libomp.so library - being loaded. - - In bscpkgs the symbolic links have been removed for the clangOmpss2 - compiler. - -1.7 Nix-shell does not allow isolation - - Nix-shell is not isolated, the compilation process tries then to - use headers and libs from /usr. - - This can induce compilation errors not happening inside nix-build. - Do not use to ensure reproducibility. - -1.8 Make doesn't rebuild objects - - When using local repo as src code, (e.g. developer mode on) a make - clean at the preBuild stage is required. - - Nix sets the same modification date (one second after the Epoch - (1970-01-01 at 00:00:01 in UTC timezone) to all the files in the nix - store (also those copied from repos). Makefile checks the files - modification date in order to call or not the compilation - instructions. If any object/binary file exists out of Nix, at the time - we build within Nix, they will be copied with the current data and - consequently not updated during the Nix compilation process. - -1.9 Sbatch silently fails on parsing - - When submitting a job with a wrong specification in MN4 with SLURM - 17.11.9-2, for example this bogus line: - - #SBATCH --nodes=1 2 - - It silently fails to parse the options, falling back to the defaults, - without any error. - - We have improved our checking to detect bogus options passed to SLURM, - so we prevent this problem from happening. - -1.10 The srun program misses signals after MPI_Finalize - - When a program receives a signal such as SIGSEGV after calling - MPI_Finalize, srun at version 17.11.7 doesn't return a error code but - exits with 0. - - This can cause bogus programs to go undetected when only checking the - return code of srun. A better approach is to check the exit code with - sacct(1) or write the exit code to a file and check it later. - -/* vim: set ts=2 sw=2 tw=72 fo=watqc expandtab spell autoindent: */ -