forked from rarias/jungle
		
	
		
			
				
	
	
		
			86 lines
		
	
	
		
			3.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			86 lines
		
	
	
		
			3.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
 | |
|                     Known sources of noise
 | |
|                        in MareNostrum 4
 | |
| 
 | |
| 
 | |
| ABSTRACT
 | |
| 
 | |
|   The experiments run at MareNostrum 4 show that there are several 
 | |
|   factors that can affect the execution time. Some may even become the 
 | |
|   dominant part of the time, rendering the experiment invalid.
 | |
| 
 | |
|   This document lists all known sources of variability and tries to give 
 | |
|   an overview on how to detect and correct the problems.
 | |
| 
 | |
| 1. Notable sources of variability
 | |
| 
 | |
|   Usually all sources were found in the MareNostrum 4 cluster, but they 
 | |
|   may apply to other machines. Some may have a detection mechanism so 
 | |
|   the effect can be neglected, but others don't. Also, some problems 
 | |
|   only occur with low probability.
 | |
| 
 | |
|   Other sources of variability with a low effect, say lower than 1% of 
 | |
|   the mean time, are not listed here.
 | |
| 
 | |
| 1.1 The daemon slurmstepd eats sys CPU in a new thread
 | |
| 
 | |
|   For a period of about 10 seconds a thread is created from the 
 | |
|   slurmstepd process when a job is running, which uses quite a lot of 
 | |
|   CPU. This event happens from time to time with unknown frequency. It 
 | |
|   was first observed in the nbody program, as it almost doubles the time 
 | |
|   per iteration, as the other processes are waiting for the one with 
 | |
|   slow CPU to continue to the next iteration. The SLURM version was 
 | |
|   17.11.7 and the program was executed with sbatch+srun. See the issue 
 | |
|   for more details:
 | |
| 
 | |
|     https://pm.bsc.es/gitlab/rarias/bsc-nixpkgs/-/issues/19
 | |
| 
 | |
|   It can be detected by looking at the cycles per us view with Extrae, 
 | |
|   with the PAPI counters enabled. It shows a slowdown in one process 
 | |
|   when the problem occurs. Also, perf-sched(1) can be used to trace 
 | |
|   context switches to other programs but requires access to the debugfs.
 | |
| 
 | |
| 1.2 MPICH uses ethernet rather than infiniband
 | |
| 
 | |
|   Some MPI implementations (like MPICH) can silently use non-optimal 
 | |
|   fabrics like the ethernet rather than infiniband because the are 
 | |
|   misconfigured.
 | |
| 
 | |
|   Can be detected by running latency benchmarks like the OSU micro 
 | |
|   benchmark, which should report a low latency. It can also be reported 
 | |
|   by using strace to ensure which network card is being used.
 | |
| 
 | |
| 1.3 CPU binding
 | |
| 
 | |
|   A thread may switch between CPUs when running, leading to a drop in 
 | |
|   performance. To ensure that it remains in the same process it can be 
 | |
|   binded with srun(1) or sbatch(1) using the --cpu-bind option, or using 
 | |
|   taskset(1).
 | |
| 
 | |
|   It can be detected by running the program with Extrae and using the 
 | |
|   General/view/executing_cpu.cfg configuration in Paraver. After 
 | |
|   adjusting the scale, all processes must have a different color from 
 | |
|   each other (the assigned CPU) and keep it constant. Otherwise changes 
 | |
|   of CPUs are happening.
 | |
| 
 | |
| 1.4 Libraries that use dlopen(3)
 | |
| 
 | |
|   Some libraries or programs try to determine which components are 
 | |
|   available in a system by looking for specific libraries in the search 
 | |
|   path determined at runtime.
 | |
| 
 | |
|   This behavior can cause a program to change the execution time 
 | |
|   depending on the environment variables like LD_LIBRARY_PATH.
 | |
| 
 | |
|   It can be detected by setting LD_DEBUG=all (see ld.so(8)) or using 
 | |
|   strace(1) when running the program.
 | |
| 
 | |
| 1.5 Intel MPI library selection
 | |
| 
 | |
|   The Intel MPI library has several variants which are loaded at run 
 | |
|   time: debug, release, debug_mt and release_mt. Of which the 
 | |
|   I_MPI_THREAD_SPLIT controls whether the multithread capabilities are 
 | |
|   enabled or not.
 | |
| 
 | |
| /* vim: set ts=2 sw=2 tw=72 fo=watqc expandtab spell autoindent: */
 |