Update execution doc with isolation
This commit is contained in:
parent
251103ffd3
commit
f2b39decba
@ -1,5 +1,5 @@
|
|||||||
.TL
|
.TL
|
||||||
Garlic execution
|
Garlic: the execution pipeline
|
||||||
.AU
|
.AU
|
||||||
Rodrigo Arias Mallo
|
Rodrigo Arias Mallo
|
||||||
.AI
|
.AI
|
||||||
@ -10,8 +10,8 @@ This document covers the execution of experiments in the Garlic
|
|||||||
benchmark, which are performed under strict conditions. The several
|
benchmark, which are performed under strict conditions. The several
|
||||||
stages of the execution are documented so the experimenter can have a
|
stages of the execution are documented so the experimenter can have a
|
||||||
global overview of how the benchmark runs under the hood.
|
global overview of how the benchmark runs under the hood.
|
||||||
During the execution of the experiments, the results are
|
The measurements taken during the execution of the experiment are stored
|
||||||
stored in a file which will be used in posterior processing steps.
|
in a file used in posterior processing steps.
|
||||||
.AE
|
.AE
|
||||||
.\"#####################################################################
|
.\"#####################################################################
|
||||||
.nr GROWPS 3
|
.nr GROWPS 3
|
||||||
@ -24,44 +24,50 @@ stored in a file which will be used in posterior processing steps.
|
|||||||
Introduction
|
Introduction
|
||||||
.LP
|
.LP
|
||||||
Every experiment in the Garlic
|
Every experiment in the Garlic
|
||||||
benchmark is controled by one
|
benchmark is controlled by a single
|
||||||
.I nix
|
.I nix
|
||||||
file.
|
file placed in the
|
||||||
An experiment consists of several shell scripts which are executed
|
.CW garlic/exp
|
||||||
sequentially and perform several tasks to setup the
|
subdirectory.
|
||||||
|
Experiments are formed by several
|
||||||
|
.I "experimental units"
|
||||||
|
or simply
|
||||||
|
.I units .
|
||||||
|
A unit is the result of each unique configuration of the experiment
|
||||||
|
(typically involves the cartesian product of all factors) and
|
||||||
|
consists of several shell scripts executed sequentially to setup the
|
||||||
.I "execution environment" ,
|
.I "execution environment" ,
|
||||||
which finally launch the actual program that is being analyzed.
|
which finally launch the actual program being analyzed.
|
||||||
The scripts that prepare the environment and the program itself are
|
The scripts that prepare the environment and the program itself are
|
||||||
called the
|
called the
|
||||||
.I stages
|
.I stages
|
||||||
of the execution, which altogether form the
|
of the execution and altogether form the
|
||||||
.I "execution pipeline"
|
.I "execution pipeline"
|
||||||
or simply the
|
or simply the
|
||||||
.I pipeline .
|
.I pipeline .
|
||||||
The experimenter must know with very good details all the stages
|
The experimenter must know with very good details all the stages
|
||||||
involved in the pipeline, as they can affect with great impact the
|
involved in the pipeline, as they have a large impact on the execution.
|
||||||
result of the execution.
|
|
||||||
.PP
|
.PP
|
||||||
The experiments have a very strong dependency on the cluster where they
|
Additionally, the execution time is impacted by the target machine in
|
||||||
run, as the results will be heavily affected. The software used for the
|
which the experiments run. The software used for the benchmark is
|
||||||
benchmark is carefully configured for the hardware used in the
|
carefully configured and tuned for the hardware used in the execution;
|
||||||
execution. In particular, the experiments are designed to run in
|
in particular, the experiments are designed to run in MareNostrum 4
|
||||||
MareNostrum 4 cluster with the SLURM workload manager. In the future we
|
cluster with the SLURM workload manager and the Omni-Path
|
||||||
plan to add support for other clusters, in order to execute the
|
interconnection network. In the future we plan to add
|
||||||
experiments in other machines.
|
support for other clusters in order to execute the experiments in other
|
||||||
|
machines.
|
||||||
.\"#####################################################################
|
.\"#####################################################################
|
||||||
.NH 1
|
.NH 1
|
||||||
Isolation
|
Isolation
|
||||||
.LP
|
.LP
|
||||||
The benchmark is designed so that both the compilation of every software
|
The benchmark is designed so that both the compilation of every software
|
||||||
package and the execution of the experiment is performed under strict
|
package and the execution of the experiment is performed under strict
|
||||||
conditions. Therefore, we can provide a guarantee that two executions
|
conditions. We can ensure that two executions of the same experiment are
|
||||||
of the same experiment are actually running the same program in the same
|
actually running the same program in the same software environment.
|
||||||
environment.
|
|
||||||
.PP
|
.PP
|
||||||
All the software used by an experiment is included in the
|
All the software used by an experiment is included in the
|
||||||
.I "nix store"
|
.I "nix store"
|
||||||
which is, by convention, located in the
|
which is, by convention, located at the
|
||||||
.CW /nix
|
.CW /nix
|
||||||
directory. Unfortunately, it is common for libraries to try to load
|
directory. Unfortunately, it is common for libraries to try to load
|
||||||
software from other paths like
|
software from other paths like
|
||||||
@ -74,130 +80,167 @@ and from the home directory of the user that runs the experiment.
|
|||||||
Additionally, some environment variables are recognized by the libraries
|
Additionally, some environment variables are recognized by the libraries
|
||||||
used in the experiment, which change their behavior. As we cannot
|
used in the experiment, which change their behavior. As we cannot
|
||||||
control the software and configuration files in those directories, we
|
control the software and configuration files in those directories, we
|
||||||
coudn't guarantee that the execution behaves as intended.
|
couldn't guarantee that the execution behaves as intended.
|
||||||
.PP
|
.PP
|
||||||
In order to avoid this problem, we create a secure
|
In order to avoid this problem, we create a
|
||||||
.I sandbox
|
.I sandbox
|
||||||
where only the files in the nix store are available (with some other
|
where only the files in the nix store are available (with some other
|
||||||
exceptions). Therefore, even if the libraries try to access any path
|
exceptions). Therefore, even if the libraries try to access any path
|
||||||
outside the nix store, they will find that the files are not there
|
outside the nix store, they will find that the files are not there
|
||||||
anymore.
|
anymore. Additionally, the environment variables are cleared before
|
||||||
|
entering the environment (with some exceptions as well).
|
||||||
.\"#####################################################################
|
.\"#####################################################################
|
||||||
.NH 1
|
.NH 1
|
||||||
Execution stages
|
Execution pipeline
|
||||||
.LP
|
.LP
|
||||||
There are several predefined stages which form the
|
Several predefined stages form the
|
||||||
.I standard
|
.I standard
|
||||||
execution pipeline. The standard pipeline is divided in two main parts:
|
execution pipeline and are defined in the
|
||||||
1) connecting to the target machine and submiting a job to SLURM, and 2)
|
.I stdPipeline
|
||||||
executing the job itself.
|
array. The standard pipeline prepares the resources and the environment
|
||||||
|
to run a program (usually in parallel) in the compute nodes. It is
|
||||||
|
divided in two main parts:
|
||||||
|
connecting to the target machine to submit a job and executing the job.
|
||||||
|
Finally, the complete execution pipeline ends by running the actual
|
||||||
|
program, which is not part of the standard pipeline, as should be
|
||||||
|
defined differently for each program.
|
||||||
.NH 2
|
.NH 2
|
||||||
Job submission
|
Job submission
|
||||||
.LP
|
.LP
|
||||||
Three stages are involved in the job submision. The
|
Some stages are involved in the job submission: the
|
||||||
.I trebuchet
|
.I trebuchet
|
||||||
stage connects via
|
stage connects via
|
||||||
.I ssh
|
.I ssh
|
||||||
to the target machine and executes the next stage there. Once in the
|
to the target machine and executes the next stage there. Once in the
|
||||||
target machine, the
|
target machine, the
|
||||||
.I isolate
|
.I isolate
|
||||||
stage is executed to enter the sandbox. Finally, the
|
stage is executed to enter the sandbox and the
|
||||||
|
.I experiment
|
||||||
|
stage is executed, running the experiment which launches several
|
||||||
|
.I unit
|
||||||
|
stages.
|
||||||
|
.PP
|
||||||
|
Each unit executes a
|
||||||
.I sbatch
|
.I sbatch
|
||||||
stage runs the
|
stage which runs the
|
||||||
.I sbatch(1)
|
.I sbatch(1)
|
||||||
program with a job script with simply executes the next stage. The
|
program with a job script that simply executes the next stage. The
|
||||||
sbatch program reads the
|
sbatch program internally reads the
|
||||||
.CW /etc/slurm/slurm.conf
|
.CW /etc/slurm/slurm.conf
|
||||||
file from outside the sandbox, so we must explicitly allow this file to
|
file from outside the sandbox, so we must explicitly allow this file to
|
||||||
be available as well as the
|
be available, as well as the
|
||||||
.I munge
|
.I munge
|
||||||
socket, used for authentication.
|
socket used for authentication by the SLURM daemon. Once the jobs are
|
||||||
|
submitted to SLURM, the experiment stage ends and the trebuchet finishes
|
||||||
|
the execution. The jobs will be queued for execution without any other
|
||||||
|
intervention from the user.
|
||||||
.PP
|
.PP
|
||||||
The rationale behind running sbatch from the sandbox is that the options
|
The rationale behind running sbatch from the sandbox is because the
|
||||||
provided in enviroment variables override the options from the job
|
options provided in environment variables override the options from the
|
||||||
script. Therefore, we avoid this problem by running sbatch from the
|
job script. Therefore, we avoid this problem by running sbatch from the
|
||||||
sandbox, where potentially dangerous environment variables were removed.
|
sandbox, where the interfering environment variables are removed. The
|
||||||
|
sbatch program is also provided in the
|
||||||
|
.I "nix store" ,
|
||||||
|
with a version compatible with the SLURM daemon running in the target
|
||||||
|
cluster.
|
||||||
.NH 2
|
.NH 2
|
||||||
Seting up the environment
|
Job execution
|
||||||
.LP
|
.LP
|
||||||
Once the job has been selected for execution, the SLURM daemon allocates
|
Once an unit job has been selected for execution, SLURM
|
||||||
the resources and then selects one of the nodes to run the job script
|
allocates the resources (usually several nodes) and then selects one of
|
||||||
(is not executed in parallel). Additionally, the job script is executed
|
the nodes to run the job script: it is not executed in parallel yet.
|
||||||
from a child process, forked from on of the SLURM processes, which is
|
The job script runs from a child process forked from on of the SLURM
|
||||||
outside the sandbox. Therefore, we first run the
|
daemon processes, which are outside the sandbox. Therefore, we first run the
|
||||||
.I isolate
|
.I isolate
|
||||||
stage
|
stage
|
||||||
to enter the sandbox again.
|
to enter the sandbox again.
|
||||||
.PP
|
.PP
|
||||||
The next stage is called
|
The next stage is called
|
||||||
.I control
|
.I control
|
||||||
and determines if enough data has been generated by the experiment or if
|
and determines if enough data has been generated by the experiment unit
|
||||||
it should continue repeating the execution. At the current time, is only
|
or if it should continue repeating the execution. At the current time,
|
||||||
implemented as a simple loop that runs the next stage a fixed amount of
|
it is only implemented as a simple loop that runs the next stage a fixed
|
||||||
times.
|
amount of times (by default, it is repeated 30 times).
|
||||||
.PP
|
.PP
|
||||||
The following stage is
|
The following stage is
|
||||||
.I srun
|
.I srun
|
||||||
which usually launches several copies of the next stage to run in
|
which launches several copies of the next stage to run in
|
||||||
parallel (when using more than one task). Runs one copy per task,
|
parallel (when using more than one task). Runs one copy per task,
|
||||||
effectively creating one process per task. The set of CPUs available to
|
effectively creating one process per task. The CPUs affinity is
|
||||||
each process is computed by the parameter
|
configured by the parameter
|
||||||
.I --cpu-bind
|
.I --cpu-bind
|
||||||
and is crucial to set it correctly; is documented in the
|
and is important to set it correctly (see more details in the
|
||||||
.I srun(1)
|
.I srun(1)
|
||||||
manual. Apending the
|
manual). Appending the
|
||||||
.I verbose
|
.I verbose
|
||||||
value to the cpu bind option causes srun to print the assigned affinity
|
value to the cpu bind option causes srun to print the assigned affinity
|
||||||
of each task so that it can be reviewed in the execution log.
|
of each task, which is very valuable when examining the execution log.
|
||||||
.PP
|
.PP
|
||||||
The mechanism by which srun executes multiple processes is the same used
|
The mechanism by which srun executes multiple processes is the same used
|
||||||
by sbatch, it forks from a SLURM daemon running in the computing nodes.
|
by sbatch, it forks from a SLURM daemon running in the computing nodes.
|
||||||
Therefore, the execution begins outside the sandbox. The next stage is
|
Therefore, the execution begins outside the sandbox. The next stage is
|
||||||
.I isolate
|
.I isolate
|
||||||
which enters again the sandbox in every task (from now on, all stages
|
which enters again the sandbox in every task. All remaining stages are
|
||||||
are running in parallel).
|
running now in parallel.
|
||||||
.PP
|
.\" ###################################################################
|
||||||
At this point in the execution, we are ready to run the actual program
|
.NH 2
|
||||||
that is the matter of the experiment. Usually, the programs require some
|
The program
|
||||||
argument options to be passed in the command line. The
|
.LP
|
||||||
.I argv
|
At this point in the execution, the standard pipeline has been
|
||||||
stage sets the arguments and optionally some environment variables and
|
completely executed, and we are ready to run the actual program that is
|
||||||
|
the matter of the experiment. Usually, programs require some arguments
|
||||||
|
to be passed in the command line. The
|
||||||
|
.I exec
|
||||||
|
stage sets the arguments (and optionally some environment variables) and
|
||||||
executes the last stage, the
|
executes the last stage, the
|
||||||
.I program .
|
.I program .
|
||||||
|
.PP
|
||||||
|
The experimenters are required to define these last stages, as they
|
||||||
|
define the specific way in which the program must be executed.
|
||||||
|
Additional stages may be included before or after the program run, so
|
||||||
|
they can perform additional steps.
|
||||||
|
.\" ###################################################################
|
||||||
.NH 2
|
.NH 2
|
||||||
Stage overview
|
Stage overview
|
||||||
.LP
|
.LP
|
||||||
The standard execution pipeline contains the stages listed in the table
|
The complete execution pipeline using the standard pipeline is shown in
|
||||||
1, ordered by the execution time. Additional stages can be placed before
|
the Table 1. Some properties are also reflected about the execution
|
||||||
the argv stage, to modify the execution. Usually debugging programs and
|
stages.
|
||||||
other options can be included there.
|
|
||||||
.KF
|
.KF
|
||||||
.TS
|
.TS
|
||||||
center;
|
center;
|
||||||
lB cB cB cB
|
lB cB cB cB cB cB
|
||||||
l c c c.
|
l c c c c c.
|
||||||
_
|
_
|
||||||
Stage Target Safe Copies
|
Stage Target Safe Copies User Std
|
||||||
_
|
_
|
||||||
trebuchet no no no
|
trebuchet xeon no no yes yes
|
||||||
isolate yes no no
|
isolate login no no yes yes
|
||||||
sbatch yes yes no
|
experiment login yes no no yes
|
||||||
isolate yes no no
|
unit login yes no no yes
|
||||||
control yes yes no
|
sbatch login yes no no yes
|
||||||
srun yes yes no
|
_
|
||||||
isolate yes no yes
|
isolate comp no no no yes
|
||||||
argv yes yes yes
|
control comp yes no no yes
|
||||||
program yes yes yes
|
srun comp yes no no yes
|
||||||
|
isolate comp no yes no yes
|
||||||
|
_
|
||||||
|
exec comp yes yes no no
|
||||||
|
program comp yes yes no no
|
||||||
_
|
_
|
||||||
.TE
|
.TE
|
||||||
.QP
|
.QS
|
||||||
.B "Table 1" :
|
.B "Table 1" :
|
||||||
The stages of a standard execution pipeline. The
|
The stages of a complete execution pipeline. The
|
||||||
.B target
|
.B target
|
||||||
column determines whether the stage is running in the target cluster;
|
column determines where the stage is running,
|
||||||
.B safe
|
.B safe
|
||||||
states if the stage is running in the sandbox and
|
states if the stage begins the execution inside the sandbox,
|
||||||
|
.B user
|
||||||
|
if it can be executed directly by the user,
|
||||||
.B copies
|
.B copies
|
||||||
if there are several instances of the stages running in parallel.
|
if there are several instances running in parallel and
|
||||||
|
.B std
|
||||||
|
if is part of the standard execution pipeline.
|
||||||
.QE
|
.QE
|
||||||
.KE
|
.KE
|
||||||
|
Loading…
Reference in New Issue
Block a user