\"Header point size .ds HP "15 12 12 0 0 0 0 0 0 0 0 0 0 0" .COVER .TL Garlic: User guide .AF "Barcelona Supercomputing Center" .AU "Rodrigo Arias Mallo" .COVEND .H 1 "Overview" .P The garlic framework is designed to fulfill all the requirements of an experimenter in all the steps up to publication. The experience gained while using it suggests that we move along three stages despicted in the following diagram: .DS CB .PS linewid=0.9; right box "Source" "code" arrow "Development" above box "Program" arrow "Experiment" above box "Results" arrow "Data" "exploration" box "Figures" .PE .DE In the development phase the experimenter changes the source code in order to introduce new features or fix bugs. Once the program is considered functional, the next phase is the experimentation, where several experiment configurations are tested to evaluate the program. It is common that some problems are spotted during this phase, which lead the experimenter to go back to the development phase and change the source code. .P Finally, when the experiment is considered completed, the experimenter moves to the next phase, which envolves the exploration of the data generated by the experiment. During this phase, it is common to generate results in the form of plots or tables which provide a clear insight in those quantities of interest. It is also common that after looking at the figures, some changes in the experiment configuration need to be introduced (or even in the source code of the program). .P Therefore, the experimenter may move forward and backwards along three phases several times. The garlic framework provides support for all the three stages (with different degrees of madurity). .H 1 "Development (work in progress)" .P During the development phase, a functional program is produced by modifying its source code. This process is generally cyclic: the developer needs to compile, debug and correct mistakes. We want to minimize the delay times, so the programs can be executed as soon as needed, but under a controlled environment so that the same behavior occurs during the experimentation phase. .P The development phase is typically carried directly in the target machine, so we need the resources first. .H 2 "Allocating resources for development" .P Our target machine (MareNostrum 4) provides an interactive shell, that can be requested with the number of computational resources required for development. .P To do so, connect to it and allocate an interactive session: .DS I .VERBON build% ssh target target% salloc ... compute% .VERBOFF .DE This operation may take some minutes to complete depending on the load of the cluster. But once the session is ready, any subsequent execution will be immediate. .H 2 "Getting the development tools" .P In order to get the same packages provided for the experiments, we can use the \fInix-develop\fP utility, which creates a namespace where the required packages are installed. Use the build machine to generate a develop environment: .DS I .VERBON build% nix-build -A garlic.develop \&... build% grep ln result ln -fs /gpfs/projects/bsc15/nix/...olate/bin/stage1 .nix-develop .VERBOFF .DE Copy the \fIln\fP command and run it in the target machine, in a new directory used for your program development. The link will be placed in a hidden file named \fI.nix-develop\fP and will be used to remember your environment. Several environments can be stored using this method, with different packages on each. .P Now you can access the newly created environment by running: .DS I .VERBON compute% nix-develop develop% .VERBOFF .DE The spawned shell contains all the packages pre-defined in the \fIgarlic.develop\fP derivation, and can now be accessed by typing the name of the commands. .DS I .VERBON develop$ which gcc /nix/store/azayfhqyg9...s8aqfmy-gcc-wrapper-9.3.0/bin/gcc develop$ which gdb /nix/store/1c833b2y8j...pnjn2nv9d46zv44dk-gdb-9.2/bin/gdb .VERBOFF .DE If you need additional packages, you can add them in the \fIgarlic/index.nix\fP file: .\" FIXME: Unify garlic.unsafeDevelop in garlic.develop, so we can .\" specify the packages directly .DS I .VERBON unsafeDevelop = callPackage ./develop/default.nix { extraInputs = with self; [ coreutils htop procps-ng vim which strace tmux gdb kakoune universal-ctags bashInteractive glibcLocales ncurses git screen curl # Add more nixpkgs packages here... bsc.slurm bsc.clangOmpss2 bsc.icc bsc.mcxx bsc.perf # Add more bscpkgs packages here... ]; }; .VERBOFF .DE Then re-execute the steps again, to build the new develop environment. .H 2 "Execution" The allocated shell can only execute tasks in the current node, which may be enough for some tests. To do so, you can directly run your program as: .DS I .VERBON develop$ ./program .VERBOFF .DE If you need to run a multi-node program, typically using MPI communications, then you can do so by using srun. Notice that you need to allocate several nodes when calling salloc previously. The srun command will execute the given program \fBoutside\fP the develop environment if executed as-is. So we re-enter the develop environment by calling nix-develop as a wrapper of the program: .\" FIXME: wrap srun to reenter the develop environment by its own .DS I .VERBON develop$ srun nix-develop ./program .VERBOFF .DE .H 2 "Debugging" The debugger can be used to directly execute the program if is executed in only one node by using: .DS I .VERBON develop$ gdb ./program .VERBOFF .DE Or it can be attached to an already running program by using its pid. You will need to first connect to the node running it, and run gdb inside the nix-develop environment. Use squeue to see the compute nodes running your program: .DS I .VERBON target$ ssh compute compute$ cd project-develop compute$ nix-develop develop$ gdb -p $pid .VERBOFF .DE You can repeat this step in other nodes to control the execution in multiple nodes. .P In those cases where the program crashes before being able to attach the debugger, you can enable the generation of core dumps: .DS I .VERBON develop$ ulimit -c unlimited .VERBOFF .DE And rerun the program, which will generate a core file that can be opened by gdb and contains the state of the memory when the crash happened. Beware that the core dump file can be very large, depending on the memory used by your program at the crash. .\" =================================================================== .H 1 "Experimentation" The experimentation phase begins with a functional program which is the object of study. The experimenter then designs an experiment aimed at measuring some properties of the program. The experiment is then executed and the results are stored for further analysis. .H 2 "Writing the experiment configuration" .P The term experiment is quite overloaded in this document. We are going to see how to write the recipe that describes the execution pipeline of an experiment. .P Within the garlic benchmark, experiments are typically sorted by a hierarchy depending on which application they belong. Take a look at the \fCgarlic/exp\fP directory and you will find some folders and .nix files. .P Each of those recipes files describe a function that returns a derivation, which, once built will result in the first stage script of the execution pipeline. .P The first part of states the name of the attributes required as the input of the function. Typically some packages, common tools and options: .DS I .VERBON { stdenv , stdexp , bsc , targetMachine , stages , garlicTools }: .VERBOFF .DE .P Notice the \fCtargetMachine\fP argument, which provides information about the machine in which the experiment will run. You should write your experiment in such a way that runs in multiple clusters. .DS I .VERBON varConf = { blocks = [ 1 2 4 ]; nodes = [ 1 ]; }; .VERBOFF .DE .P The \fCvarConf\fP is the attribute set that allows you to vary some factors in the experiment. .DS I .VERBON genConf = var: fix (self: targetMachine.config // { expName = "example"; unitName = self.expName + "-b" + toString self.blocks; blocks = var.blocks; nodes = var.nodes; cpusPerTask = 1; tasksPerNode = self.hw.socketsPerNode; }); .VERBOFF .DE .P The \fCgenConf\fP function is the central part of the description of the experiment. Takes as input \fBone\fP configuration from the cartesian product of .I varConfig and returns the complete configuration. In our case, it will be called 3 times, with the following inputs at each time: .DS I .VERBON { blocks = 1; nodes = 1; } { blocks = 2; nodes = 1; } { blocks = 4; nodes = 1; } .VERBOFF .DE .P The return value can be inspected by calling the function in the interactive nix repl: .DS I .VERBON nix-repl> genConf { blocks = 2; nodes = 1; } { blocks = 2; cpusPerTask = 1; expName = "example"; hw = { ... }; march = "skylake-avx512"; mtune = "skylake-avx512"; name = "mn4"; nixPrefix = "/gpfs/projects/bsc15/nix"; nodes = 1; sshHost = "mn1"; tasksPerNode = 2; unitName = "example-b2"; } .VERBOFF .DE .P Some configuration parameters were added by .I targetMachine.config , such as the .I nixPrefix , .I sshHost or the .I hw attribute set, which are specific for the cluster they experiment is going to run. Also, the .I unitName got assigned the proper name based on the number of blocks, but the number of tasks per node were assigned based on the hardware description of the target machine. .P By following this rule, the experiments can easily be ported to machines with other hardware characteristics, and we only need to define the hardware details once. Then all the experiments will be updated based on those details. .H 2 "First steps" .P The complete results generally take a long time to be finished, so it is advisable to design the experiments iteratively, in order to quickly obtain some feedback. Some recommendations: .BL .LI Start with one unit only. .LI Set the number of runs low (say 5) but more than one. .LI Use a small problem size, so the execution time is low. .LI Set the time limit low, so deadlocks are caught early. .LE .P As soon as the first runs are complete, examine the results and test that everything looks good. You would likely want to check: .BL .LI The resources where assigned as intended (nodes and CPU affinity). .LI No errors or warnings: look at stderr and stdout logs. .LI If a deadlock happens, it will run out of the time limit. .LE .P As you gain confidence over that the execution went as planned, begin increasing the problem size, the number of runs, the time limit and lastly the number of units. The rationale is that each unit that is shared among experiments gets assigned the same hash. Therefore, you can iteratively add more units to an experiment, and if they are already executed (and the results were generated) is reused. .SK .H 1 "Annex A: Branch name diagram" .DS CB .S -2 .PS 4.6/25.4 copy "gitbranch.pic" .PE .S P .DE .TC