Document the execution pipeline
This commit is contained in:
parent
81004b5ee6
commit
e8d884a627
9
garlic/doc/Makefile
Normal file
9
garlic/doc/Makefile
Normal file
@ -0,0 +1,9 @@
|
||||
all: execution.pdf execution.txt
|
||||
|
||||
%.pdf: %.ms
|
||||
groff -ms -tbl -Tpdf $^ > $@
|
||||
#pdfms $^ 2>&1 >$@ | sed 's/^troff: //g'
|
||||
killall -HUP mupdf
|
||||
|
||||
%.txt: %.ms
|
||||
groff -ms -tbl -Tutf8 $^ > $@
|
203
garlic/doc/execution.ms
Normal file
203
garlic/doc/execution.ms
Normal file
@ -0,0 +1,203 @@
|
||||
.TL
|
||||
Garlic execution
|
||||
.AU
|
||||
Rodrigo Arias Mallo
|
||||
.AI
|
||||
Barcelona Supercomputing Center
|
||||
.AB
|
||||
.LP
|
||||
This document covers the execution of experiments in the Garlic
|
||||
benchmark, which are performed under strict conditions. The several
|
||||
stages of the execution are documented so the experimenter can have a
|
||||
global overview of how the benchmark runs under the hood.
|
||||
During the execution of the experiments, the results are
|
||||
stored in a file which will be used in posterior processing steps.
|
||||
.AE
|
||||
.\"#####################################################################
|
||||
.nr GROWPS 3
|
||||
.nr PSINCR 1.5p
|
||||
.\".nr PD 0.5m
|
||||
.nr PI 2m
|
||||
\".2C
|
||||
.\"#####################################################################
|
||||
.NH 1
|
||||
Introduction
|
||||
.LP
|
||||
Every experiment in the Garlic
|
||||
benchmark is controled by one
|
||||
.I nix
|
||||
file.
|
||||
An experiment consists of several shell scripts which are executed
|
||||
sequentially and perform several tasks to setup the
|
||||
.I "execution environment" ,
|
||||
which finally launch the actual program that is being analyzed.
|
||||
The scripts that prepare the environment and the program itself are
|
||||
called the
|
||||
.I stages
|
||||
of the execution, which altogether form the
|
||||
.I "execution pipeline"
|
||||
or simply the
|
||||
.I pipeline .
|
||||
The experimenter must know with very good details all the stages
|
||||
involved in the pipeline, as they can affect with great impact the
|
||||
result of the execution.
|
||||
.PP
|
||||
The experiments have a very strong dependency on the cluster where they
|
||||
run, as the results will be heavily affected. The software used for the
|
||||
benchmark is carefully configured for the hardware used in the
|
||||
execution. In particular, the experiments are designed to run in
|
||||
MareNostrum 4 cluster with the SLURM workload manager. In the future we
|
||||
plan to add support for other clusters, in order to execute the
|
||||
experiments in other machines.
|
||||
.\"#####################################################################
|
||||
.NH 1
|
||||
Isolation
|
||||
.LP
|
||||
The benchmark is designed so that both the compilation of every software
|
||||
package and the execution of the experiment is performed under strict
|
||||
conditions. Therefore, we can provide a guarantee that two executions
|
||||
of the same experiment are actually running the same program in the same
|
||||
environment.
|
||||
.PP
|
||||
All the software used by an experiment is included in the
|
||||
.I "nix store"
|
||||
which is, by convention, located in the
|
||||
.CW /nix
|
||||
directory. Unfortunately, it is common for libraries to try to load
|
||||
software from other paths like
|
||||
.CW /usr
|
||||
or
|
||||
.CW /lib .
|
||||
It is also common that configuration files are loaded from
|
||||
.CW /etc
|
||||
and from the home directory of the user that runs the experiment.
|
||||
Additionally, some environment variables are recognized by the libraries
|
||||
used in the experiment, which change their behavior. As we cannot
|
||||
control the software and configuration files in those directories, we
|
||||
coudn't guarantee that the execution behaves as intended.
|
||||
.PP
|
||||
In order to avoid this problem, we create a secure
|
||||
.I sandbox
|
||||
where only the files in the nix store are available (with some other
|
||||
exceptions). Therefore, even if the libraries try to access any path
|
||||
outside the nix store, they will find that the files are not there
|
||||
anymore.
|
||||
.\"#####################################################################
|
||||
.NH 1
|
||||
Execution stages
|
||||
.LP
|
||||
There are several predefined stages which form the
|
||||
.I standard
|
||||
execution pipeline. The standard pipeline is divided in two main parts:
|
||||
1) connecting to the target machine and submiting a job to SLURM, and 2)
|
||||
executing the job itself.
|
||||
.NH 2
|
||||
Job submission
|
||||
.LP
|
||||
Three stages are involved in the job submision. The
|
||||
.I trebuchet
|
||||
stage connects via
|
||||
.I ssh
|
||||
to the target machine and executes the next stage there. Once in the
|
||||
target machine, the
|
||||
.I isolate
|
||||
stage is executed to enter the sandbox. Finally, the
|
||||
.I sbatch
|
||||
stage runs the
|
||||
.I sbatch(1)
|
||||
program with a job script with simply executes the next stage. The
|
||||
sbatch program reads the
|
||||
.CW /etc/slurm/slurm.conf
|
||||
file from outside the sandbox, so we must explicitly allow this file to
|
||||
be available as well as the
|
||||
.I munge
|
||||
socket, used for authentication.
|
||||
.PP
|
||||
The rationale behind running sbatch from the sandbox is that the options
|
||||
provided in enviroment variables override the options from the job
|
||||
script. Therefore, we avoid this problem by running sbatch from the
|
||||
sandbox, where potentially dangerous environment variables were removed.
|
||||
.NH 2
|
||||
Seting up the environment
|
||||
.LP
|
||||
Once the job has been selected for execution, the SLURM daemon allocates
|
||||
the resources and then selects one of the nodes to run the job script
|
||||
(is not executed in parallel). Additionally, the job script is executed
|
||||
from a child process, forked from on of the SLURM processes, which is
|
||||
outside the sandbox. Therefore, we first run the
|
||||
.I isolate
|
||||
stage
|
||||
to enter the sandbox again.
|
||||
.PP
|
||||
The next stage is called
|
||||
.I control
|
||||
and determines if enough data has been generated by the experiment or if
|
||||
it should continue repeating the execution. At the current time, is only
|
||||
implemented as a simple loop that runs the next stage a fixed amount of
|
||||
times.
|
||||
.PP
|
||||
The following stage is
|
||||
.I srun
|
||||
which usually launches several copies of the next stage to run in
|
||||
parallel (when using more than one task). Runs one copy per task,
|
||||
effectively creating one process per task. The set of CPUs available to
|
||||
each process is computed by the parameter
|
||||
.I --cpu-bind
|
||||
and is crucial to set it correctly; is documented in the
|
||||
.I srun(1)
|
||||
manual. Apending the
|
||||
.I verbose
|
||||
value to the cpu bind option causes srun to print the assigned affinity
|
||||
of each task so that it can be reviewed in the execution log.
|
||||
.PP
|
||||
The mechanism by which srun executes multiple processes is the same used
|
||||
by sbatch, it forks from a SLURM daemon running in the computing nodes.
|
||||
Therefore, the execution begins outside the sandbox. The next stage is
|
||||
.I isolate
|
||||
which enters again the sandbox in every task (from now on, all stages
|
||||
are running in parallel).
|
||||
.PP
|
||||
At this point in the execution, we are ready to run the actual program
|
||||
that is the matter of the experiment. Usually, the programs require some
|
||||
argument options to be passed in the command line. The
|
||||
.I argv
|
||||
stage sets the arguments and optionally some environment variables and
|
||||
executes the last stage, the
|
||||
.I program .
|
||||
.NH 2
|
||||
Stage overview
|
||||
.LP
|
||||
The standard execution pipeline contains the stages listed in the table
|
||||
1, ordered by the execution time. Additional stages can be placed before
|
||||
the argv stage, to modify the execution. Usually debugging programs and
|
||||
other options can be included there.
|
||||
.KF
|
||||
.TS
|
||||
center;
|
||||
lB cB cB cB
|
||||
l c c c.
|
||||
_
|
||||
Stage Target Safe Copies
|
||||
_
|
||||
trebuchet no no no
|
||||
isolate yes no no
|
||||
sbatch yes yes no
|
||||
isolate yes no no
|
||||
control yes yes no
|
||||
srun yes yes no
|
||||
isolate yes no yes
|
||||
argv yes yes yes
|
||||
program yes yes yes
|
||||
_
|
||||
.TE
|
||||
.QP
|
||||
.B "Table 1" :
|
||||
The stages of a standard execution pipeline. The
|
||||
.B target
|
||||
column determines whether the stage is running in the target cluster;
|
||||
.B safe
|
||||
states if the stage is running in the sandbox and
|
||||
.B copies
|
||||
if there are several instances of the stages running in parallel.
|
||||
.QE
|
||||
.KE
|
Loading…
Reference in New Issue
Block a user