Update execution doc with isolation

2020-10-13 12:13:56 +02:00 · 2020-10-13 12:13:56 +02:00 · f2b39decba
commit f2b39decba
parent 251103ffd3
1 changed files with 129 additions and 86 deletions
--- a/garlic/doc/execution.ms
+++ b/garlic/doc/execution.ms
@ -1,5 +1,5 @@
 .TL
-Garlic execution
+Garlic: the execution pipeline
 .AU
 Rodrigo Arias Mallo
 .AI
@ -10,8 +10,8 @@ This document covers the execution of experiments in the Garlic
 benchmark, which are performed under strict conditions. The several
 stages of the execution are documented so the experimenter can have a
 global overview of how the benchmark runs under the hood.
-During the execution of the experiments, the results are
+The measurements taken during the execution of the experiment are stored
-stored in a file which will be used in posterior processing steps.
+in a file used in posterior processing steps.
 .AE
 .\"#####################################################################
 .nr GROWPS 3
@ -24,44 +24,50 @@ stored in a file which will be used in posterior processing steps.
 Introduction
 .LP
 Every experiment in the Garlic
-benchmark is controled by one
+benchmark is controlled by a single
 .I nix
-file.
+file placed in the
-An experiment consists of several shell scripts which are executed
+.CW garlic/exp
-sequentially and perform several tasks to setup the
+subdirectory.
 Experiments are formed by several
 .I "experimental units"
 or simply
 .I units .
 A unit is the result of each unique configuration of the experiment 
 (typically involves the cartesian product of all factors) and
 consists of several shell scripts executed sequentially to setup the
 .I "execution environment" ,
-which finally launch the actual program that is being analyzed.
+which finally launch the actual program being analyzed.
 The scripts that prepare the environment and the program itself are
 called the
 .I stages
-of the execution, which altogether form the
+of the execution and altogether form the
 .I "execution pipeline"
 or simply the
 .I pipeline .
 The experimenter must know with very good details all the stages
-involved in the pipeline, as they can affect with great impact the
+involved in the pipeline, as they have a large impact on the execution.
 result of the execution.
 .PP
-The experiments have a very strong dependency on the cluster where they
+Additionally, the execution time is impacted by the target machine in
-run, as the results will be heavily affected. The software used for the
+which the experiments run. The software used for the benchmark is
-benchmark is carefully configured for the hardware used in the
+carefully configured and tuned for the hardware used in the execution;
-execution. In particular, the experiments are designed to run in
+in particular, the experiments are designed to run in MareNostrum 4
-MareNostrum 4 cluster with the SLURM workload manager. In the future we
+cluster with the SLURM workload manager and the Omni-Path
-plan to add support for other clusters, in order to execute the
+interconnection network. In the future we plan to add
-experiments in other machines.
+support for other clusters in order to execute the experiments in other
 machines.
 .\"#####################################################################
 .NH 1
 Isolation
 .LP
 The benchmark is designed so that both the compilation of every software
 package and the execution of the experiment is performed under strict
-conditions. Therefore, we can provide a guarantee that two executions
+conditions. We can ensure that two executions of the same experiment are
-of the same experiment are actually running the same program in the same
+actually running the same program in the same software environment.
 environment.
 .PP
 All the software used by an experiment is included in the
 .I "nix store"
-which is, by convention, located in the
+which is, by convention, located at the
 .CW /nix
 directory. Unfortunately, it is common for libraries to try to load
 software from other paths like
@ -74,130 +80,167 @@ and from the home directory of the user that runs the experiment.
 Additionally, some environment variables are recognized by the libraries
 used in the experiment, which change their behavior. As we cannot
 control the software and configuration files in those directories, we
-coudn't guarantee that the execution behaves as intended.
+couldn't guarantee that the execution behaves as intended.
 .PP
-In order to avoid this problem, we create a secure
+In order to avoid this problem, we create a
 .I sandbox
 where only the files in the nix store are available (with some other
 exceptions). Therefore, even if the libraries try to access any path
 outside the nix store, they will find that the files are not there
-anymore.
+anymore. Additionally, the environment variables are cleared before
 entering the environment (with some exceptions as well).
 .\"#####################################################################
 .NH 1
-Execution stages
+Execution pipeline
 .LP
-There are several predefined stages which form the
+Several predefined stages form the
 .I standard
-execution pipeline. The standard pipeline is divided in two main parts:
+execution pipeline and are defined in the
-1) connecting to the target machine and submiting a job to SLURM, and 2)
+.I stdPipeline
-executing the job itself.
+array. The standard pipeline prepares the resources and the environment
 to run a program (usually in parallel) in the compute nodes. It is
 divided in two main parts:
 connecting to the target machine to submit a job and executing the job.
 Finally, the complete execution pipeline ends by running the actual
 program, which is not part of the standard pipeline, as should be
 defined differently for each program.
 .NH 2
 Job submission
 .LP
-Three stages are involved in the job submision. The
+Some stages are involved in the job submission: the
 .I trebuchet
 stage connects via
 .I ssh
 to the target machine and executes the next stage there. Once in the
 target machine, the
 .I isolate
-stage is executed to enter the sandbox. Finally, the
+stage is executed to enter the sandbox and the
 .I experiment
 stage is executed, running the experiment which launches several
 .I unit
 stages.
 .PP
 Each unit executes a
 .I sbatch
-stage runs the
+stage which runs the
 .I sbatch(1)
-program with a job script with simply executes the next stage. The
+program with a job script that simply executes the next stage. The
-sbatch program reads the
+sbatch program internally reads the
 .CW /etc/slurm/slurm.conf
 file from outside the sandbox, so we must explicitly allow this file to
-be available as well as the
+be available, as well as the
 .I munge
-socket, used for authentication.
+socket used for authentication by the SLURM daemon. Once the jobs are
 submitted to SLURM, the experiment stage ends and the trebuchet finishes
 the execution. The jobs will be queued for execution without any other
 intervention from the user.
 .PP
-The rationale behind running sbatch from the sandbox is that the options
+The rationale behind running sbatch from the sandbox is because the
-provided in enviroment variables override the options from the job
+options provided in environment variables override the options from the
-script. Therefore, we avoid this problem by running sbatch from the
+job script. Therefore, we avoid this problem by running sbatch from the
-sandbox, where potentially dangerous environment variables were removed.
+sandbox, where the interfering environment variables are removed. The
 sbatch program is also provided in the
 .I "nix store" ,
 with a version compatible with the SLURM daemon running in the target
 cluster.
 .NH 2
-Seting up the environment
+Job execution
 .LP
-Once the job has been selected for execution, the SLURM daemon allocates
+Once an unit job has been selected for execution, SLURM
-the resources and then selects one of the nodes to run the job script
+allocates the resources (usually several nodes) and then selects one of
-(is not executed in parallel). Additionally, the job script is executed
+the nodes to run the job script: it is not executed in parallel yet.
-from a child process, forked from on of the SLURM processes, which is
+The job script runs from a child process forked from on of the SLURM
-outside the sandbox. Therefore, we first run the
+daemon processes, which are outside the sandbox. Therefore, we first run the
 .I isolate
 stage
 to enter the sandbox again.
 .PP
 The next stage is called
 .I control
-and determines if enough data has been generated by the experiment or if
+and determines if enough data has been generated by the experiment unit
-it should continue repeating the execution. At the current time, is only
+or if it should continue repeating the execution. At the current time,
-implemented as a simple loop that runs the next stage a fixed amount of
+it is only implemented as a simple loop that runs the next stage a fixed
-times.
+amount of times (by default, it is repeated 30 times).
 .PP
 The following stage is
 .I srun
-which usually launches several copies of the next stage to run in
+which launches several copies of the next stage to run in
 parallel (when using more than one task). Runs one copy per task,
-effectively creating one process per task. The set of CPUs available to
+effectively creating one process per task. The CPUs affinity is
-each process is computed by the parameter
+configured by the parameter
 .I --cpu-bind
-and is crucial to set it correctly; is documented in the
+and is important to set it correctly (see more details in the
 .I srun(1)
-manual. Apending the
+manual). Appending the
 .I verbose
 value to the cpu bind option causes srun to print the assigned affinity
-of each task so that it can be reviewed in the execution log.
+of each task, which is very valuable when examining the execution log.
 .PP
 The mechanism by which srun executes multiple processes is the same used
 by sbatch, it forks from a SLURM daemon running in the computing nodes.
 Therefore, the execution begins outside the sandbox. The next stage is
 .I isolate
-which enters again the sandbox in every task (from now on, all stages
+which enters again the sandbox in every task. All remaining stages are
-are running in parallel). 
+running now in parallel.
-.PP
+.\" ###################################################################
-At this point in the execution, we are ready to run the actual program
+.NH 2
-that is the matter of the experiment. Usually, the programs require some
+The program
-argument options to be passed in the command line. The
+.LP
-.I argv
+At this point in the execution, the standard pipeline has been
-stage sets the arguments and optionally some environment variables and
+completely executed, and we are ready to run the actual program that is
 the matter of the experiment. Usually, programs require some arguments
 to be passed in the command line. The
 .I exec
 stage sets the arguments (and optionally some environment variables) and
 executes the last stage, the
 .I program .
 .PP
 The experimenters are required to define these last stages, as they
 define the specific way in which the program must be executed.
 Additional stages may be included before or after the program run, so
 they can perform additional steps.
 .\" ###################################################################
 .NH 2
 Stage overview
 .LP
-The standard execution pipeline contains the stages listed in the table
+The complete execution pipeline using the standard pipeline is shown in
-1, ordered by the execution time. Additional stages can be placed before
+the Table 1. Some properties are also reflected about the execution
-the argv stage, to modify the execution. Usually debugging programs and
+stages.
 other options can be included there.
 .KF
 .TS
 center;
-lB cB cB cB
+lB cB cB cB cB cB
-l  c  c  c.
+l  c  c  c  c  c.
 _
-Stage     	Target	Safe	Copies
+Stage     	Target	Safe	Copies	User	Std
 _
-trebuchet	no	no	no
+trebuchet	xeon	no	no	yes	yes
-isolate 	yes	no	no
+isolate 	login	no	no	yes	yes
-sbatch  	yes	yes	no
+experiment	login	yes	no	no	yes
-isolate 	yes	no	no
+unit    	login	yes	no	no	yes
-control 	yes	yes	no
+sbatch  	login	yes	no	no	yes
-srun    	yes	yes	no
+_
-isolate    	yes	no	yes
+isolate 	comp	no	no	no	yes
-argv    	yes	yes	yes
+control 	comp	yes	no	no	yes
-program    	yes	yes	yes
+srun    	comp	yes	no	no	yes
 isolate    	comp	no	yes	no	yes
 _
 exec    	comp	yes	yes	no	no
 program    	comp	yes	yes	no	no
 _
 .TE
-.QP
+.QS
 .B "Table 1" :
-The stages of a standard execution pipeline. The
+The stages of a complete execution pipeline. The
 .B target
-column determines whether the stage is running in the target cluster;
+column determines where the stage is running,
 .B safe
-states if the stage is running in the sandbox and
+states if the stage begins the execution inside the sandbox,
 .B user
 if it can be executed directly by the user,
 .B copies
-if there are several instances of the stages running in parallel.
+if there are several instances running in parallel and
 .B std
 if is part of the standard execution pipeline.
 .QE
 .KE