forked from rarias/jungle
204 lines
6.8 KiB
Plaintext
204 lines
6.8 KiB
Plaintext
.TL
|
|
Garlic execution
|
|
.AU
|
|
Rodrigo Arias Mallo
|
|
.AI
|
|
Barcelona Supercomputing Center
|
|
.AB
|
|
.LP
|
|
This document covers the execution of experiments in the Garlic
|
|
benchmark, which are performed under strict conditions. The several
|
|
stages of the execution are documented so the experimenter can have a
|
|
global overview of how the benchmark runs under the hood.
|
|
During the execution of the experiments, the results are
|
|
stored in a file which will be used in posterior processing steps.
|
|
.AE
|
|
.\"#####################################################################
|
|
.nr GROWPS 3
|
|
.nr PSINCR 1.5p
|
|
.\".nr PD 0.5m
|
|
.nr PI 2m
|
|
\".2C
|
|
.\"#####################################################################
|
|
.NH 1
|
|
Introduction
|
|
.LP
|
|
Every experiment in the Garlic
|
|
benchmark is controled by one
|
|
.I nix
|
|
file.
|
|
An experiment consists of several shell scripts which are executed
|
|
sequentially and perform several tasks to setup the
|
|
.I "execution environment" ,
|
|
which finally launch the actual program that is being analyzed.
|
|
The scripts that prepare the environment and the program itself are
|
|
called the
|
|
.I stages
|
|
of the execution, which altogether form the
|
|
.I "execution pipeline"
|
|
or simply the
|
|
.I pipeline .
|
|
The experimenter must know with very good details all the stages
|
|
involved in the pipeline, as they can affect with great impact the
|
|
result of the execution.
|
|
.PP
|
|
The experiments have a very strong dependency on the cluster where they
|
|
run, as the results will be heavily affected. The software used for the
|
|
benchmark is carefully configured for the hardware used in the
|
|
execution. In particular, the experiments are designed to run in
|
|
MareNostrum 4 cluster with the SLURM workload manager. In the future we
|
|
plan to add support for other clusters, in order to execute the
|
|
experiments in other machines.
|
|
.\"#####################################################################
|
|
.NH 1
|
|
Isolation
|
|
.LP
|
|
The benchmark is designed so that both the compilation of every software
|
|
package and the execution of the experiment is performed under strict
|
|
conditions. Therefore, we can provide a guarantee that two executions
|
|
of the same experiment are actually running the same program in the same
|
|
environment.
|
|
.PP
|
|
All the software used by an experiment is included in the
|
|
.I "nix store"
|
|
which is, by convention, located in the
|
|
.CW /nix
|
|
directory. Unfortunately, it is common for libraries to try to load
|
|
software from other paths like
|
|
.CW /usr
|
|
or
|
|
.CW /lib .
|
|
It is also common that configuration files are loaded from
|
|
.CW /etc
|
|
and from the home directory of the user that runs the experiment.
|
|
Additionally, some environment variables are recognized by the libraries
|
|
used in the experiment, which change their behavior. As we cannot
|
|
control the software and configuration files in those directories, we
|
|
coudn't guarantee that the execution behaves as intended.
|
|
.PP
|
|
In order to avoid this problem, we create a secure
|
|
.I sandbox
|
|
where only the files in the nix store are available (with some other
|
|
exceptions). Therefore, even if the libraries try to access any path
|
|
outside the nix store, they will find that the files are not there
|
|
anymore.
|
|
.\"#####################################################################
|
|
.NH 1
|
|
Execution stages
|
|
.LP
|
|
There are several predefined stages which form the
|
|
.I standard
|
|
execution pipeline. The standard pipeline is divided in two main parts:
|
|
1) connecting to the target machine and submiting a job to SLURM, and 2)
|
|
executing the job itself.
|
|
.NH 2
|
|
Job submission
|
|
.LP
|
|
Three stages are involved in the job submision. The
|
|
.I trebuchet
|
|
stage connects via
|
|
.I ssh
|
|
to the target machine and executes the next stage there. Once in the
|
|
target machine, the
|
|
.I isolate
|
|
stage is executed to enter the sandbox. Finally, the
|
|
.I sbatch
|
|
stage runs the
|
|
.I sbatch(1)
|
|
program with a job script with simply executes the next stage. The
|
|
sbatch program reads the
|
|
.CW /etc/slurm/slurm.conf
|
|
file from outside the sandbox, so we must explicitly allow this file to
|
|
be available as well as the
|
|
.I munge
|
|
socket, used for authentication.
|
|
.PP
|
|
The rationale behind running sbatch from the sandbox is that the options
|
|
provided in enviroment variables override the options from the job
|
|
script. Therefore, we avoid this problem by running sbatch from the
|
|
sandbox, where potentially dangerous environment variables were removed.
|
|
.NH 2
|
|
Seting up the environment
|
|
.LP
|
|
Once the job has been selected for execution, the SLURM daemon allocates
|
|
the resources and then selects one of the nodes to run the job script
|
|
(is not executed in parallel). Additionally, the job script is executed
|
|
from a child process, forked from on of the SLURM processes, which is
|
|
outside the sandbox. Therefore, we first run the
|
|
.I isolate
|
|
stage
|
|
to enter the sandbox again.
|
|
.PP
|
|
The next stage is called
|
|
.I control
|
|
and determines if enough data has been generated by the experiment or if
|
|
it should continue repeating the execution. At the current time, is only
|
|
implemented as a simple loop that runs the next stage a fixed amount of
|
|
times.
|
|
.PP
|
|
The following stage is
|
|
.I srun
|
|
which usually launches several copies of the next stage to run in
|
|
parallel (when using more than one task). Runs one copy per task,
|
|
effectively creating one process per task. The set of CPUs available to
|
|
each process is computed by the parameter
|
|
.I --cpu-bind
|
|
and is crucial to set it correctly; is documented in the
|
|
.I srun(1)
|
|
manual. Apending the
|
|
.I verbose
|
|
value to the cpu bind option causes srun to print the assigned affinity
|
|
of each task so that it can be reviewed in the execution log.
|
|
.PP
|
|
The mechanism by which srun executes multiple processes is the same used
|
|
by sbatch, it forks from a SLURM daemon running in the computing nodes.
|
|
Therefore, the execution begins outside the sandbox. The next stage is
|
|
.I isolate
|
|
which enters again the sandbox in every task (from now on, all stages
|
|
are running in parallel).
|
|
.PP
|
|
At this point in the execution, we are ready to run the actual program
|
|
that is the matter of the experiment. Usually, the programs require some
|
|
argument options to be passed in the command line. The
|
|
.I argv
|
|
stage sets the arguments and optionally some environment variables and
|
|
executes the last stage, the
|
|
.I program .
|
|
.NH 2
|
|
Stage overview
|
|
.LP
|
|
The standard execution pipeline contains the stages listed in the table
|
|
1, ordered by the execution time. Additional stages can be placed before
|
|
the argv stage, to modify the execution. Usually debugging programs and
|
|
other options can be included there.
|
|
.KF
|
|
.TS
|
|
center;
|
|
lB cB cB cB
|
|
l c c c.
|
|
_
|
|
Stage Target Safe Copies
|
|
_
|
|
trebuchet no no no
|
|
isolate yes no no
|
|
sbatch yes yes no
|
|
isolate yes no no
|
|
control yes yes no
|
|
srun yes yes no
|
|
isolate yes no yes
|
|
argv yes yes yes
|
|
program yes yes yes
|
|
_
|
|
.TE
|
|
.QP
|
|
.B "Table 1" :
|
|
The stages of a standard execution pipeline. The
|
|
.B target
|
|
column determines whether the stage is running in the target cluster;
|
|
.B safe
|
|
states if the stage is running in the sandbox and
|
|
.B copies
|
|
if there are several instances of the stages running in parallel.
|
|
.QE
|
|
.KE
|