182 lines
5.1 KiB
Plaintext
182 lines
5.1 KiB
Plaintext
|
.COVER
|
||
|
.TL
|
||
|
Garlic: User guide
|
||
|
.AF "Barcelona Supercomputing Center"
|
||
|
.AU "Rodrigo Arias Mallo"
|
||
|
.COVEND
|
||
|
.H 1 "Overview"
|
||
|
Dependency graph of a complete experiment that produces a figure. Each box
|
||
|
is a derivation and arrows represent \fBbuild dependencies\fP.
|
||
|
.DS CB
|
||
|
.PS
|
||
|
linewid=0.9;
|
||
|
right
|
||
|
box "Source" "code"
|
||
|
arrow <-> "Develop" above
|
||
|
box "Program"
|
||
|
arrow <-> "Experiment" above
|
||
|
box "Results"
|
||
|
arrow <-> "Data" "exploration"
|
||
|
box "Figures"
|
||
|
.PE
|
||
|
.DE
|
||
|
.H 1 "Development"
|
||
|
.P
|
||
|
The development phase consists in creating a functional program by
|
||
|
modifying the source code. This process is generally cyclic, where the
|
||
|
developer needs to compile the program, correct mistakes and debug the
|
||
|
program.
|
||
|
.P
|
||
|
It requires to be running in the target machine.
|
||
|
.\" ===================================================================
|
||
|
.H 1 "Experimentation"
|
||
|
The experimentation phase begins with a functional program which is the
|
||
|
object of study. The experimenter then designs an experiment aimed at
|
||
|
measuring some properties of the program. The experiment is then
|
||
|
executed and the results are stored for further analysis.
|
||
|
.H 2 "Writing the experiment configuration"
|
||
|
.P
|
||
|
The term experiment is quite overloaded in this document. We are going
|
||
|
to see how to write the recipe that describes the execution pipeline of
|
||
|
an experiment.
|
||
|
.P
|
||
|
Within the garlic benchmark, experiments are typically sorted by a
|
||
|
hierarchy depending on which application they belong. Take a look at the
|
||
|
\fCgarlic/exp\fP directory and you will find some folders and .nix
|
||
|
files.
|
||
|
.P
|
||
|
Each of those recipes files describe a function that returns a
|
||
|
derivation, which, once built will result in the first stage script of
|
||
|
the execution pipeline.
|
||
|
.P
|
||
|
The first part of states the name of the attributes required as the
|
||
|
input of the function. Typically some packages, common tools and options:
|
||
|
.DS I
|
||
|
.VERBON
|
||
|
{
|
||
|
stdenv
|
||
|
, stdexp
|
||
|
, bsc
|
||
|
, targetMachine
|
||
|
, stages
|
||
|
, garlicTools
|
||
|
}:
|
||
|
.VERBOFF
|
||
|
.DE
|
||
|
.P
|
||
|
Notice the \fCtargetMachine\fP argument, which provides information
|
||
|
about the machine in which the experiment will run. You should write
|
||
|
your experiment in such a way that runs in multiple clusters.
|
||
|
.DS I
|
||
|
.VERBON
|
||
|
varConf = {
|
||
|
blocks = [ 1 2 4 ];
|
||
|
nodes = [ 1 ];
|
||
|
};
|
||
|
.VERBOFF
|
||
|
.DE
|
||
|
.P
|
||
|
The \fCvarConf\fP is the attribute set that allows you to vary some
|
||
|
factors in the experiment.
|
||
|
.DS I
|
||
|
.VERBON
|
||
|
genConf = var: fix (self: targetMachine.config // {
|
||
|
expName = "example";
|
||
|
unitName = self.expName + "-b" + toString self.blocks;
|
||
|
blocks = var.blocks;
|
||
|
nodes = var.nodes;
|
||
|
cpusPerTask = 1;
|
||
|
tasksPerNode = self.hw.socketsPerNode;
|
||
|
});
|
||
|
.VERBOFF
|
||
|
.DE
|
||
|
.P
|
||
|
The \fCgenConf\fP function is the central part of the description of the
|
||
|
experiment. Takes as input \fBone\fP configuration from the cartesian
|
||
|
product of
|
||
|
.I varConfig
|
||
|
and returns the complete configuration. In our case, it will be
|
||
|
called 3 times, with the following inputs at each time:
|
||
|
.DS I
|
||
|
.VERBON
|
||
|
{ blocks = 1; nodes = 1; }
|
||
|
{ blocks = 2; nodes = 1; }
|
||
|
{ blocks = 4; nodes = 1; }
|
||
|
.VERBOFF
|
||
|
.DE
|
||
|
.P
|
||
|
The return value can be inspected by calling the function in the
|
||
|
interactive nix repl:
|
||
|
.DS I
|
||
|
.VERBON
|
||
|
nix-repl> genConf { blocks = 2; nodes = 1; }
|
||
|
{
|
||
|
blocks = 2;
|
||
|
cpusPerTask = 1;
|
||
|
expName = "example";
|
||
|
hw = { ... };
|
||
|
march = "skylake-avx512";
|
||
|
mtune = "skylake-avx512";
|
||
|
name = "mn4";
|
||
|
nixPrefix = "/gpfs/projects/bsc15/nix";
|
||
|
nodes = 1;
|
||
|
sshHost = "mn1";
|
||
|
tasksPerNode = 2;
|
||
|
unitName = "example-b2";
|
||
|
}
|
||
|
.VERBOFF
|
||
|
.DE
|
||
|
.P
|
||
|
Some configuration parameters were added by
|
||
|
.I targetMachine.config ,
|
||
|
such as the
|
||
|
.I nixPrefix ,
|
||
|
.I sshHost
|
||
|
or the
|
||
|
.I hw
|
||
|
attribute set, which are specific for the cluster they experiment is
|
||
|
going to run. Also, the
|
||
|
.I unitName
|
||
|
got assigned the proper name based on the number of blocks, but the
|
||
|
number of tasks per node were assigned based on the hardware description
|
||
|
of the target machine.
|
||
|
.P
|
||
|
By following this rule, the experiments can easily be ported to machines
|
||
|
with other hardware characteristics, and we only need to define the
|
||
|
hardware details once. Then all the experiments will be updated based on
|
||
|
those details.
|
||
|
.H 2 "First steps"
|
||
|
.P
|
||
|
The complete results generally take a long time to be finished, so it is
|
||
|
advisable to design the experiments iteratively, in order to quickly
|
||
|
obtain some feedback. Some recommendations:
|
||
|
.BL
|
||
|
.LI
|
||
|
Start with one unit only.
|
||
|
.LI
|
||
|
Set the number of runs low (say 5) but more than one.
|
||
|
.LI
|
||
|
Use a small problem size, so the execution time is low.
|
||
|
.LI
|
||
|
Set the time limit low, so deadlocks are caught early.
|
||
|
.LE
|
||
|
.P
|
||
|
As soon as the first runs are complete, examine the results and test
|
||
|
that everything looks good. You would likely want to check:
|
||
|
.BL
|
||
|
.LI
|
||
|
The resources where assigned as intended (nodes and CPU affinity).
|
||
|
.LI
|
||
|
No errors or warnings: look at stderr and stdout logs.
|
||
|
.LI
|
||
|
If a deadlock happens, it will run out of the time limit.
|
||
|
.LE
|
||
|
.P
|
||
|
As you gain confidence over that the execution went as planned, begin
|
||
|
increasing the problem size, the number of runs, the time limit and
|
||
|
lastly the number of units. The rationale is that each unit that is
|
||
|
shared among experiments gets assigned the same hash. Therefore, you can
|
||
|
iteratively add more units to an experiment, and if they are already
|
||
|
executed (and the results were generated) is reused.
|
||
|
.TC
|