bscpkgs/garlic/doc/slides/2.mm

1469 lines
30 KiB
Plaintext

.\"usage: NS title
.de NS \" New Slide
.SK
.ev gp-top
.fam H
.vs 1.5m
.ll \\n[@ll]u
.lt \\n[@ll]u
.rs
.sp 2v
.ps +5
\\$*
.ps -5
.sp 1.5v
.br
.ev
..
.\" Remove headers
.de TP
..
.\" Bigger page number in footer
.de EOP
.fam H
.ps +2
. ie o .tl \\*[pg*odd-footer]
. el .tl \\*[pg*even-footer]
. ds hd*format \\g[P]
. af P 0
. ie (\\n[P]=1)&(\\n[N]=1) .tl \\*[pg*header]
. el .tl \\*[pg*footer]
. af P \\*[hd*format]
. tl ''\\*[Pg_type!\\n[@copy_type]]''
..
.\" Remove top and bottom margin
.VM 0 0
.\"
.\"
.\" Set virtual page dimensions for a physical size of 16x12 cm
.PGFORM 14c 12c 1c 1
.ND "November 24, 2020"
.\" .vs 1.5m
.S C 1.5m
.fam H
.\".PH "'cosas'''"
.COVER ms
.de cov@print-date
.DS C
.fam H
.B
\\*[cov*new-date]
.DE
..
.TL
.ps 20
.fam H
Garlic update
.AF "Barcelona Supercomputing Center"
.AU "Rodrigo Arias Mallo"
.COVEND
.PF "'''%'"
.\" Turn off justification
.SA 0
.\".PF '''%'
.\"==================================================================
.NS "Changelog"
Important changes since the last meeting (2020-09-23)
.BL
.LI
Execution of experiments is now \fBisolated\fP: no $HOME or /usr at run time
.LI
Added a \fBpostprocess\fP pipeline
.LI
New \fBgarlic(1)\fP helper tool (manual included)
.LI
A plot has an experiment result as \fBdependency\fP
.LI
Experiments run on demand based on article \fBfigures\fP
.LI
Fast pkg overrides (MPI)
.LE 1
.\"==================================================================
.NS "Overview"
Dependency graph of a complete experiment that produces a figure. Each box
is a derivation and arrows represent \fBbuild dependencies\fP.
.DS CB
.S -3.5
.PS
circlerad=0.3;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
P: box "Program"
arrow
box "..."
arrow
T: box "Trebuchet"
arrow
box "Result" "(MN4)" dashed
arrow
R: box "ResultTree"
arrow
box "..."
arrow
F: box "Figure"
arrow <-> from P.nw + (0, 0.2) to T.ne + (0, 0.2) \
"Execution pipeline (EP)" above
arrow <-> from R.nw + (0, 0.2) to F.ne + (0, 0.2) \
"Postprocess pipeline (PP)" above
.PE
.S P P
.DE
.P
The \fBResult\fP is not covered by nix (yet). This is what it looks like
when executed:
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
circle "Build EP"
arrow
circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
R: box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
.PE
.S P P
.DE
.P
Notice dependency order is not the same as execution order.
.\"==================================================================
.NS "Building the execution pipeline (EP)"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP" fill
arrow
R: circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
Run nix-build with the experiment name:
.P
.VERBON
xeon07$ nix-build -A exp.nbody.baseline
\&...
/nix/store/5zhmdzi5mf0mfsran74cxngn07ba522m-trebuchet
.VERBOFF
.P
Outputs the first stage (the trebuchet). All other stages
are built as dependencies, as they are required to build the trebuchet.
.\"==================================================================
.NS "Running the EP"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP" fill
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
circlerad=0.2;
linewid=0.3;
T: circle at B + (0,-1.3) "trebu."
arrow
circle "runexp"
arrow
circle "isolate"
arrow
circle "exp."
arrow
circle "..."
arrow
circle "exec"
arrow
P: circle "program"
line from R.sw to T.nw dashed
line from R.se to P.n dashed
arrow <-> from T.w - (0, 0.35) to P.e - (0, 0.35) \
"Execution pipeline stages" below
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.SP 1m
.P
The stages are launched sequentially. Let see what happens in each one.
.\"==================================================================
.NS "Execution pipeline"
.2C
List of stages required to run the program of the experiment:
.BL
.S -1
.LI
The
.B target
column determines where the stage is running.
.LI
.B Safe
states if the stage begins the execution inside the isolated namespace
.LI
.B User
if it can be executed directly by the user
.LI
.B Copies
if there are several instances running in parallel and
.LI
.B Std
if is part of the standard execution pipeline.
.LE
.S P P
.P
Sorted by the \fBexecution order\fP.
.\" Go to the next column
.NCOL
.KF
.defcolor white rgb #FFFFFF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
\m[white]\(rh\m[]\
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBtrebuchet\fP: connects via ssh to the target machine and executes the
next stage there.
.P
The target machine is set to MN4, which by default uses the host
\fBmn1\fP
.P
Literally:
.P
.VERBON
ssh mn1 /path/to/next/stage
.VERBOFF
.P
You need to define the ssh config to be able to connect to mn1.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
\(rh \fBtrebuchet\fP xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBrunexp\fP: sets a few \fCGARLIC_*\fP environment variables used by the
benchmark and changes the current directory to the \fBout\fP directory.
.P
At build time, next stages don't know these values (cyclic dependency),
so they are populated at execution time.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
\(rh \fBrunexp\fP login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBisolate\fP: once on the target machine, we enter an isolated
namespace to load the nix store.
.P
Notice that this and the previous stages require the \fBsh\fP shell to be
available on the target machine
.P
They are not \fBsafe\fP as we run target machine code
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
\(rh \fBisolate\fP login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBexperiment\fP: runs several units sequentially.
.P
Defines the \fCGARLIC_EXPERIMENT\fP environment variable.
.P
Creates a directory for the experiment and changes the current directory
there.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
\(rh \fBexperiment\fP login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBunit\fP: creates an index entry for the unit and the experiment.
.P
Creates a directory for the unit and changes the current directory
there.
.P
Copies the unit configuration in the \fCgarlic_config.json\fP file
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
\(rh \fBunit\fP login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBsbatch\fP: allocates resources and executes the next stage in the
first node.
.P
The execve call is performed by a SLURM daemon, so is \fBout\fP of the
isolated environment.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
\(rh \fBsbatch\fP login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBisolate\fP: enters the isolated namespace again, with the nix store.
.P
Notice that we are now running in the compute node allocated by SLURM.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
\(rh \fBisolate\fP comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBcontrol\fP: runs the next stage several times
.P
Is controlled by the \fCloops\fP attribute, which specifies the number
of runs.
.P
Creates a directory with the number of the run and enters it.
.P
Generated results are placed in this directory.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
\(rh \fBcontrol\fP comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBsrun\fP: launches the tasks in the compute nodes and sets the
affinity.
.P
From here on, all stages are executed in parallel for each task.
.P
The srun program also forks from a SLURM daemon, exiting the
previous isolated namespace.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
\(rh \fBsrun\fP comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBisolate\fP: enter the isolated namespace again.
.P
Now we are ready to execute the program of the experiment.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
\(rh \fBisolate\fP comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBexec\fP: sets the environment variables and argv of the program.
.P
Additional commands can be specified in the \fCpre\fP and \fCpost\fP
attributes.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
\(rh \fBexec\fP comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
\fBprogram\fP: the path to the program itself.
.P
This stage can be used to do some changes:
.BL
.LI
Set the mpi implementation of all dependencies.
.LI
Pass build options
.LI
Custom packages (nanos6 with jemalloc)
.LE
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
isolate login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
isolate comp no no no yes
control comp yes no no yes
srun comp yes no no yes
isolate comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
\(rh \fBprogram\fP comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Execution stages"
.2C
The \fCstdexp.nix\fP file defines standard pipeline. The last two stages
are usually added to complete the pipeline:
.P
.VERBON
pipeline = stdPipeline ++
[ exec program ];
.VERBOFF
.P
Any stage can be modified to fit a custom experiment.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
\(rh \fBtrebuchet\fP xeon no no yes \fByes\fP
\(rh \fBrunexp\fP login no no yes \fByes\fP
\(rh \fBisolate\fP login no no no \fByes\fP
\(rh \fBexperiment\fP login yes no no \fByes\fP
\(rh \fBunit\fP login yes no no \fByes\fP
\(rh \fBsbatch\fP login yes no no \fByes\fP
_ _ _ _ _ _
\(rh \fBisolate\fP comp no no no \fByes\fP
\(rh \fBcontrol\fP comp yes no no \fByes\fP
\(rh \fBsrun\fP comp yes no no \fByes\fP
\(rh \fBisolate\fP comp no yes no \fByes\fP
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Isolated execution"
.2C
The filesystem is \fBnow\fP isolated to prevent irreproducible
scenarios.
.P
The nix store is mounted at /nix and only some other paths are
available like:
.BL
.S -1 1m
.LI
/var/run/munge (required for SLURM)
.LI
/dev, /sys, /proc for MPI comm
.LI
/etc for hosts (FIXME)
.LI
/gpfs/projects/bsc15 to store data
.LE
.S P P
.P
Additional mounts can be requested by using the \fCextraMounts\fP
attribute.
.\" Go to the next column
.NCOL
.KF
.S 8 14p
.\".S C +0.2v
.TS
center expand;
lB lB cB cB cB cB cB
lB lB cB cB cB cB cB
r lw(5.5m) c c c c c.
_ _ _ _ _ _
Stage Target Safe Copies User Std
_ _ _ _ _ _
trebuchet xeon no no yes yes
runexp login no no yes yes
\(rh \fBisolate\fP login no no no yes
experiment login yes no no yes
unit login yes no no yes
sbatch login yes no no yes
_ _ _ _ _ _
\(rh \fBisolate\fP comp no no no yes
control comp yes no no yes
srun comp yes no no yes
\(rh \fBisolate\fP comp no yes no yes
_ _ _ _ _ _
exec comp yes yes no no
program comp yes yes no no
_ _ _ _ _ _
.TE
.S P P
.KE
.1C
.\"==================================================================
.NS "Running the EP"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP" fill
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
We cannot access MN4 from nix, as it doesn't has the SSH keys nor
network access when building derivations.
.P
The garlic(1) tool is used to run experiments and fetch the results. See
the manual for details.
.\"==================================================================
.NS "Running the EP"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP" fill
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
To launch the EP use \fBgarlic -R\fP and provide the trebuchet path:
.P
.VERBON
.S -2
xeon07$ garlic -Rv /nix/store/5zhmdzi5mf0mfsran74cxngn07ba522m-trebuchet
Running experiment 1qcc...9w5-experiment
sbatch: error: spank: x11.so: Plugin file not found
Submitted batch job 12719522
\&...
xeon07$
.S P P
.VERBOFF
.P
Once the jobs are submited, you can leave the session: it will run
in MN4 automatically at some point.
.\"==================================================================
.NS "Execution complete"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP"
arrow
box "Result" "(MN4)" dashed fill
arrow
circle "Fetch"
arrow
box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
When the EP is complete, the generated results are stored in MN4.
.P
As stated previously, nix cannot access MN4 (yet), so we need to manually
fetch the results.
.\"==================================================================
.NS "Fetching the results"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch" fill
arrow
box "ResultTree"
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
To fetch the results, use \fBgarlic -F\fP:
.P
.VERBON
.S -3.5
xeon07$ garlic -Fv /nix/store/5zhmdzi5mf0mfsran74cxngn07ba522m-trebuchet
/mnt/garlic/bsc15557/out/1qc...9w5-experiment: checking units
3qnm6drx5y95kxrr43gnwqz8v4x641c7-unit: running 7 of 10
awd3jzbcw0cwwvjrcrxzjvii3mgj663d-unit: completed
bqnnrwcbcixag0dfflk1zz34zidk97nf-unit: no status
\&...
/mn...w5-experiment: \f[CB]execution complete, fetching results\fP
these derivations will be built:
/nix/store/mqdr...q4z-resultTree.drv
\&...
\f[CB]/nix/store/jql41hms1dr49ipbjcw41i4dj4pq2cb0-resultTree\fP
.S P P
.VERBOFF
.P
Notice that if the experiments are still running, it waits for the
completion of all units first.
.\"==================================================================
.NS "Fetching the results"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
box "ResultTree" fill
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
.VERBON
.S -3.5
\&...
\f[CB]/nix/store/jql41hms1dr49ipbjcw41i4dj4pq2cb0-resultTree\fP
.S P P
.VERBOFF
.P
When the fetch operation success, the \fBresultTree\fP derivation is
built, with the \fBlogs\fP of the execution.
.P
All other generated data is \fBignored by now\fP, as we don't want to
store large files in the nix store of xeon07.
.\"==================================================================
.NS "Running and fetching"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP" fill
arrow
box "Result" "(MN4)" dashed fill
arrow
circle "Fetch" fill
arrow
box "ResultTree" fill
arrow
circle "Build PP"
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
You can run an experiment and fetch the results with \fBgarlic -RF\fP in
one go:
.P
.VERBON
.S -2
xeon07$ garlic -RF /nix/store/5zhmdzi5mf0mfsran74cxngn07ba522m-trebuchet
.S P P
.VERBOFF
.P
Remember that you can interrupt the fetching while is waiting, and come
later if the experiment takes too long.
.P
If nix tries to build \fBResultTree\fP and doesn't find the experiment
results, it will tell you to run this command to run and fetch the
experiment. Example: building the figure before running the experiment:
.P
.VERBON
.S -2
xeon07$ nix-build -A fig.nbody.baseline
.S P P
.VERBOFF
.\"==================================================================
.NS "Postprocess pipeline (PP)"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
R: circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
box "ResultTree"
arrow
circle "Build PP" fill
arrow
F: box "Figure"
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
Once the \fBresultTree\fP derivation is built, multiple figures can be created
without re-running the experiment.
.P
The postprocess pipeline is formed of several stages as well, but is
considered \fBexperimental\fP; there is no standard yet.
.P
It only needs to be built, as nix can perform all tasks to create the
figures (no manual intervention)
.\"==================================================================
.NS "Building the postprocess pipeline (PP)"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
R: box "ResultTree"
arrow
PP: circle "Build PP" fill
arrow
F: box "Figure"
circlerad=0.2;
linewid=0.3;
T: box at R + (-0.02,-0.8) "timetable"
arrow
box "merge"
arrow
P: box "rPlot"
line from PP.sw to T.n dashed
line from PP.se to P.n dashed
arrow <-> from T.w - (0, 0.35) to P.e - (0, 0.35) \
"Execution pipeline stages" below
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
To build the figure, only three stages are required: timetable, merge
and rPlot.
.\"==================================================================
.NS "PP stages: timetable"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
box "timetable" fill
arrow
box "merge"
arrow
P: box "rPlot"
.PE
.S P P
.DE
.P
The timetable transforms the logs of the execution into a NDJSON file,
which contains all the unit configuration and the execution time in one
line in JSON:
.P
.VERBON
.S -2
{ "unit":"...", "experiment":"...", "run":1, "config":{...}, "time":1.2345 }
{ "unit":"...", "experiment":"...", "run":2, "config":{...}, "time":1.2333 }
{ "unit":"...", "experiment":"...", "run":3, "config":{...}, "time":1.2323 }
.S P P
.VERBOFF
.P
This format allows R (and possibly other programs) to load \fBall\fP
information regarding the experiment configuration into a table.
.P
It requires the execution logs to contain a line with the time:
.P
.VERBON
.S -2
time 1.2345
.S P P
.VERBOFF
.\"==================================================================
.NS "PP stages: merge"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
box "timetable"
arrow
box "merge" fill
arrow
P: box "rPlot"
.PE
.S P P
.DE
.P
The merge stage allows multiple results of several experiments to be
merged in one dataset.
.P
In this way, multiple results can be presented in one figure.
.P
It simple concatenates all the NDJSON files together.
.P
This stage can be build directly with:
.P
.VERBON
$ nix-build ds.nbody.baseline
.VERBOFF
.P
So you can inspect the dataset and play with it before generating the
plots (is automatically built by nix as a dependency).
.\"==================================================================
.NS "PP stages: rPlot"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
box "timetable"
arrow
box "merge"
arrow
P: box "rPlot" fill
.PE
.S P P
.DE
.P
Finally, the rPlot stage runs a R script that loads the NDJSON dataset
and generates some plots.
.\"==================================================================
.NS "Building the figures"
.DS CB
.S -3.5
.PS
circlerad=0.25;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
B: circle "Build EP"
arrow
circle "Run EP"
arrow
box "Result" "(MN4)" dashed
arrow
circle "Fetch"
arrow
R: box "ResultTree"
arrow
PP: circle "Build PP"
arrow
F: box "Figure" fill
arrow from B.w + (0, 0.35) to F.e + (0, 0.35) \
"Order or execution" above
.PE
.S P P
.DE
.P
The complete PP and the figures can be build by using:
.P
.VERBON
xeon07$ nix-build -A fig.nbody.baseline
.VERBOFF
.P
A interactive R shell can be used to play with the presentation of the
plots:
.P
.VERBON
xeon07$ nix-shell garlic/fig/dev/shell.nix
$ cp /nix/store/...-merge.json input.json
$ R
> source("garlic/fig/nbody/baseline.R")
.VERBOFF
.P
More about this later.
.\"==================================================================
.NS "Figure dependencies"
.DS CB
.S -3.5
.PS
circlerad=0.3;
linewid=0.3;
boxwid=0.52;
boxht=0.35;
fillval=0.2;
right
P: box "Program"
arrow
box "..."
arrow
T: box "Trebuchet"
arrow
box "Result" "(MN4)" dashed
arrow
R: box "ResultTree"
arrow
box "..."
arrow
F: box "Figure" fill
arrow <-> from P.nw + (0, 0.2) to T.ne + (0, 0.2) \
"Execution pipeline (EP)" above
arrow <-> from R.nw + (0, 0.2) to F.ne + (0, 0.2) \
"Postprocess pipeline (PP)" above
.PE
.S P P
.DE
.P
The figure contains as dependencies all the EP, results and PP.
.P
Any change in any of the stages (or dependencies) will lead to a new
figure, \fBautomatically\fP.
.P
Figures contain the hash of the dataset in the title, so they can
be tracked.
.\"==================================================================
.NS "Article with figures"
.P
An example LaTeX document uses the name of the figures in nix:
.P
.VERBON
\\includegraphics[]{@fig.nbody.small@/scatter.png}
.VERBOFF
.P
Then, nix will extract all figure references, build them (re-running the
experiment if required) and build the report: \fC$ nix-build
garlic.report\fP
.P
We also have \fBreportTar\fP that puts the figures, LaTeX sources and
a Makefile required to build the report into a self-contained tar.gz.
.P
It can be compiled with \fBmake\fP (no nix required) so it can be sent
to a journal for further changes in the LaTeX source.
.\"==================================================================
.NS "Other changes"
.DL
.LI
We can provide the complete benchmark and BSC packages as a simple
overlay. This allows others to load their own changes on top or below our
benchmark.
.LI
We now avoid reevaluation of nixpkgs when setting the MPI
implementation (allows faster evaluations: 2 s/unit \(-> 2 s total).
.LI
Dependencies between experiments results are posible (experimental):
allows generation of a dataset + computation with dependencies.
.LE
.\"==================================================================
.NS "Questions?"
.defcolor gray rgb #bbbbbb
\m[gray]
.P
Example questions:
.DL
.LI
What software was used to build this presentation?
.LI
I used groff.
.LI
And the diagrams?
.LI
Same :-D
.LI
How long takes to build?
.LI
0,39s user 0,02s system 129% cpu 0,316 total
.LE
\m[]