diff --git a/garlic/doc/ug.mm b/garlic/doc/ug.mm new file mode 100644 index 0000000..b0a4c48 --- /dev/null +++ b/garlic/doc/ug.mm @@ -0,0 +1,181 @@ +.COVER +.TL +Garlic: User guide +.AF "Barcelona Supercomputing Center" +.AU "Rodrigo Arias Mallo" +.COVEND +.H 1 "Overview" +Dependency graph of a complete experiment that produces a figure. Each box +is a derivation and arrows represent \fBbuild dependencies\fP. +.DS CB +.PS +linewid=0.9; +right +box "Source" "code" +arrow <-> "Develop" above +box "Program" +arrow <-> "Experiment" above +box "Results" +arrow <-> "Data" "exploration" +box "Figures" +.PE +.DE +.H 1 "Development" +.P +The development phase consists in creating a functional program by +modifying the source code. This process is generally cyclic, where the +developer needs to compile the program, correct mistakes and debug the +program. +.P +It requires to be running in the target machine. +.\" =================================================================== +.H 1 "Experimentation" +The experimentation phase begins with a functional program which is the +object of study. The experimenter then designs an experiment aimed at +measuring some properties of the program. The experiment is then +executed and the results are stored for further analysis. +.H 2 "Writing the experiment configuration" +.P +The term experiment is quite overloaded in this document. We are going +to see how to write the recipe that describes the execution pipeline of +an experiment. +.P +Within the garlic benchmark, experiments are typically sorted by a +hierarchy depending on which application they belong. Take a look at the +\fCgarlic/exp\fP directory and you will find some folders and .nix +files. +.P +Each of those recipes files describe a function that returns a +derivation, which, once built will result in the first stage script of +the execution pipeline. +.P +The first part of states the name of the attributes required as the +input of the function. Typically some packages, common tools and options: +.DS I +.VERBON +{ + stdenv +, stdexp +, bsc +, targetMachine +, stages +, garlicTools +}: +.VERBOFF +.DE +.P +Notice the \fCtargetMachine\fP argument, which provides information +about the machine in which the experiment will run. You should write +your experiment in such a way that runs in multiple clusters. +.DS I +.VERBON +varConf = { + blocks = [ 1 2 4 ]; + nodes = [ 1 ]; +}; +.VERBOFF +.DE +.P +The \fCvarConf\fP is the attribute set that allows you to vary some +factors in the experiment. +.DS I +.VERBON +genConf = var: fix (self: targetMachine.config // { + expName = "example"; + unitName = self.expName + "-b" + toString self.blocks; + blocks = var.blocks; + nodes = var.nodes; + cpusPerTask = 1; + tasksPerNode = self.hw.socketsPerNode; +}); +.VERBOFF +.DE +.P +The \fCgenConf\fP function is the central part of the description of the +experiment. Takes as input \fBone\fP configuration from the cartesian +product of +.I varConfig +and returns the complete configuration. In our case, it will be +called 3 times, with the following inputs at each time: +.DS I +.VERBON +{ blocks = 1; nodes = 1; } +{ blocks = 2; nodes = 1; } +{ blocks = 4; nodes = 1; } +.VERBOFF +.DE +.P +The return value can be inspected by calling the function in the +interactive nix repl: +.DS I +.VERBON +nix-repl> genConf { blocks = 2; nodes = 1; } +{ + blocks = 2; + cpusPerTask = 1; + expName = "example"; + hw = { ... }; + march = "skylake-avx512"; + mtune = "skylake-avx512"; + name = "mn4"; + nixPrefix = "/gpfs/projects/bsc15/nix"; + nodes = 1; + sshHost = "mn1"; + tasksPerNode = 2; + unitName = "example-b2"; +} +.VERBOFF +.DE +.P +Some configuration parameters were added by +.I targetMachine.config , +such as the +.I nixPrefix , +.I sshHost +or the +.I hw +attribute set, which are specific for the cluster they experiment is +going to run. Also, the +.I unitName +got assigned the proper name based on the number of blocks, but the +number of tasks per node were assigned based on the hardware description +of the target machine. +.P +By following this rule, the experiments can easily be ported to machines +with other hardware characteristics, and we only need to define the +hardware details once. Then all the experiments will be updated based on +those details. +.H 2 "First steps" +.P +The complete results generally take a long time to be finished, so it is +advisable to design the experiments iteratively, in order to quickly +obtain some feedback. Some recommendations: +.BL +.LI +Start with one unit only. +.LI +Set the number of runs low (say 5) but more than one. +.LI +Use a small problem size, so the execution time is low. +.LI +Set the time limit low, so deadlocks are caught early. +.LE +.P +As soon as the first runs are complete, examine the results and test +that everything looks good. You would likely want to check: +.BL +.LI +The resources where assigned as intended (nodes and CPU affinity). +.LI +No errors or warnings: look at stderr and stdout logs. +.LI +If a deadlock happens, it will run out of the time limit. +.LE +.P +As you gain confidence over that the execution went as planned, begin +increasing the problem size, the number of runs, the time limit and +lastly the number of units. The rationale is that each unit that is +shared among experiments gets assigned the same hash. Therefore, you can +iteratively add more units to an experiment, and if they are already +executed (and the results were generated) is reused. +.TC