user guide: use ms macros

Added HTML output
2021-02-04 14:49:02 +01:00 · 2021-02-04 14:49:02 +01:00 · c46feb4bf2
commit c46feb4bf2
parent 4d626bff97
3 changed files with 1024 additions and 1396 deletions
--- a/garlic/doc/Makefile
+++ b/garlic/doc/Makefile
@ -1,5 +1,5 @@
 all: execution.pdf execution.utf8 execution.ascii pp.pdf pp.utf8 pp.ascii\
-	branch.pdf blackbox.pdf ug.pdf
+	branch.pdf blackbox.pdf ug.pdf ug.html
 TTYOPT=-rPO=4m -rLL=72m
 PDFOPT=-dpaper=a4 -rPO=4c -rLL=13c
@ -8,26 +8,29 @@ PREPROC=-k -t -p -R
 POSTPROC=
 REGISTERS=-dcurdate="`date '+%Y-%m-%d'`"
 REGISTERS+=-dgitcommit="`git rev-parse HEAD`"
 PREPROC+=$(REGISTERS)
 HTML_OPT=$(PREPROC) -P-y -P-V -P-Dimg -P-i120 -Thtml
 # Embed fonts?
 #POSTPROC+=-P -e
 blackbox.pdf: blackbox.ms Makefile
 	REFER=ref.i groff -ms $(PREPROC) -dpaper=a4 -rPO=2c -rLL=17c -Tpdf $< > $@
 ug.pdf: ug.mm Makefile
 	groff -mm $(PREPROC) $(POSTPROC) $(REGISTERS) -dpaper=a4 -Tpdf $< > $@
 	-killall -HUP mupdf
 %.html: %.ms Makefile
-	REFER=ref.i groff -ms $(PREPROC) $(POSTPROC) $(REGISTERS) -Thtml $< > $@
+	REFER=ref.i groff -ms -mwww $(HTML_OPT) $< > $@
 	echo $(HTML_OPT)
 	sed -i '/<\/head>/i<link rel="stylesheet" href="s.css">' $@
 	sed -i 's/^<a name="\([^"]*\)"><\/a>/<a name="\1" href="#\1">\&sect;<\/a>/g' $@
 	#sed -i '/<h1 /,/<hr>/s/^<a href="#[0-9]\+\.[0-9]\+\.[0-9]\+.*//' $@
 	sed -i '/<h1 /,/<hr>/s/^<a href="#[0-9]\+\.[0-9]\+.*//' $@
 %.pdf: %.ms Makefile
-	REFER=ref.i groff -ms $(PREPROC) $(PDFOPT) -Tpdf $< > $@
+	REFER=ref.i groff -ms -mwww $(PREPROC) $(PDFOPT) -Tpdf $< > $@
 	-killall -HUP mupdf
 %.utf8: %.ms
-	REFER=ref.i groff -ms $(PREPROC) $(TTYOPT) -Tutf8 $^ > $@
+	REFER=ref.i groff -ms -mwww $(PREPROC) $(TTYOPT) -Tutf8 $^ > $@
 %.ascii: %.ms
-	REFER=ref.i groff -ms -c $(PREPROC) $(TTYOPT) -Tascii $^ > $@
+	REFER=ref.i groff -ms -mwww -c $(PREPROC) $(TTYOPT) -Tascii $^ > $@
--- a/garlic/doc/ug.mm
+++ b/garlic/doc/ug.mm
@ -1,843 +0,0 @@
 .ds HP "21 16 13 12 0 0 0 0 0 0 0 0 0 0"
 .nr Ej 1
 .nr Hb 3
 .nr Hs 3
 .S 11p 1.3m
 .PH "''''"
 .PF "''''"
 .PGFORM 14c 29c 3.5c
 .\".COVER
 .\".de cov@print-date
 .\".DS C
 .\"\\*[cov*new-date]
 .\".DE
 .\"..
 .\".TL
 .\".ps 20
 .\"Garlic: User guide
 .\".AF "Barcelona Supercomputing Center"
 .\".AU "Rodrigo Arias Mallo"
 .\".COVEND
 \&
 .SP 3c
 .DS C
 .S 25 1
 Garlic: User guide
 .S P P
 .SP 1v
 .S 12 1.5m
 Rodrigo Arias Mallo
 .I "Barcelona Supercomputing Center"
 \*[curdate]
 .S P P
 .SP 15c
 .S 9 1.5m
 Git commit hash
 \f(CW\*[gitcommit]\fP
 .S P P
 .DE
 .bp
 .PF "''%''"
 .\" ===================================================================
 .H 1 "Introduction"
 .P
 The garlic framework provides all the tools to experiment with HPC
 programs and produce publication articles.
 .\" ===================================================================
 .H 2 "Machines and clusters"
 Our current setup employs multiple machines to build and execute the
 experiments. Each cluster and node has it's own name and will be
 different in other clusters. Therefore, instead of using the names of
 the machines we use machine classes to generalize our setup. Those
 machine clases currently correspond to a physical machine each:
 .BL
 .LI
 .B Builder
 (xeon07): runs the nix-daemon and performs the builds in /nix. Requires
 root access to setup de nix-daemon.
 .LI 
 .B Target
 (MareNostrum 4 compute nodes): the nodes where the experiments 
 are executed. It doesn't need to have /nix installed or root access.
 .LI 
 .B Login
 (MareNostrum 4 login nodes): used to allocate resources and run jobs. It
 doesn't need to have /nix installed or root access.
 .LI 
 .B Laptop
 (where the keyboard is attached): used to connect to the other machines.
 No root access is required or /nix, but needs to be able to connect to
 the builder.
 .LE
 .\".P
 .\"The specific details of each machine class can be summarized in the
 .\"following table:
 .\".TS
 .\"center;
 .\"lB cB cB cB cB lB lB lB
 .\"lB  c  c  c  c  l  l  l.
 .\"_
 .\"Class	daemon	store	root	dl	cpus	space	cluster	node
 .\"_
 .\"laptop	no	no	no	yes	low	1GB	-	-
 .\"build	yes	yes	yes	yes	high	50GB	Cobi	xeon07
 .\"login	no	yes	no	no	low	MN4	mn1
 .\"target	no	yes	no	no	high	MN4	compute nodes
 .\"_
 .\".TE
 .P
 The machines don't need to be different of each others, as one machine
 can implement several classes. For example the laptop can act as the
 builder too but is not recommended. Or the login machine can also
 perform the builds, but is not possible yet in our setup.
 .\" ===================================================================
 .H 2 "Properties"
 .P
 We can define the following three properties:
 .BL 1m
 .LI
 R0: \fBSame\fP people on the \fBsame\fP machine obtain the same result
 .LI
 R1: \fBDifferent\fP people on the \fBsame\fP machine obtain the same result
 .LI
 R2: \fBDifferent\fP people on a \fBdifferent\fP machine obtain the same result
 .LE
 .P
 The garlic framework distinguishes two classes of results: the result of
 building a derivation, which are usually binary programs, and the
 results of the execution of an experiment.
 .P
 Building a derivation is usually R2, the result is bit-by-bit identical
 excepting some rare cases. One example is that during the build process,
 a directory is listed by the order of the inodes, giving a random order
 which is different between builds. These problems are tracked by the
 .I https://r13y.com/
 project. In the minimal installation, less than 1% of the derivations
 don't achieve the R2 property.
 .P
 On the other hand, the results of the experiments are not yet R2, as
 they are tied to the target machine.
 .\" ===================================================================
 .H 1 "Preliminary steps"
 The peculiarities of our setup require that users perform some actions
 to use the garlic framework. The content of this section is only
 intended for the users of our machines, but can serve as reference in
 other machines.
 .P
 The names of the machine classes are used in the command line prompt
 instead of the actual name of the machine, to indicate that the command
 needs to be executed in the stated machine class, for example:
 .DS I
 .VERBON
 builder% echo hi
 hi
 .VERBOFF
 .DE
 When the machine class is not important, it is ignored and only the
 "\f(CW%\fP" prompt appears.
 .\" ===================================================================
 .H 2 "Configure your laptop"
 .P
 To easily connect to the builder (xeon07) in one step, configure the SSH
 client to perform a jump over the Cobi login node. The
 .I ProxyJump
 directive is only available in version 7.3 and upwards. Add the
 following lines in the \f(CW\(ti/.ssh/config\fP file of your laptop:
 .DS I
 .VERBON
 Host cobi
      HostName ssflogin.bsc.es
      User your-username-here
 Host xeon07
      ProxyJump cobi
      HostName xeon07
      User your-username-here
 .VERBOFF
 .DE
 You should be able to connect to the builder typing:
 .DS I
 .VERBON
 laptop$ ssh xeon07
 .VERBOFF
 .DE
 To spot any problems try with the \f(CW-v\fP option to enable verbose
 output.
 .\" ===================================================================
 .H 2 "Configure the builder (xeon07)"
 .P
 In order to use nix you would need to be able to download the sources 
 from Internet. Usually the download requires the ports 22, 80 and 443 
 to be open for outgoing traffic.
 .P
 Check that you have network access in
 xeon07 provided by the environment variables \fIhttp_proxy\fP and
 \fIhttps_proxy\fP. Try to fetch a webpage with curl, to ensure the proxy
 is working:
 .DS I
 .VERBON
  xeon07$ curl x.com
  x
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 3 "Create a new SSH key"
 .P
 There is one DSA key in your current home called "cluster" that is no
 longer supported in recent SSH versions and should not be used. Before
 removing it, create a new one without password protection leaving the
 passphrase empty (in case that you don't have one already created) by
 running:
 .DS I
 .VERBON
 xeon07$ ssh-keygen
 Generating public/private rsa key pair.
 Enter file in which to save the key (\(ti/.ssh/id_rsa):
 Enter passphrase (empty for no passphrase):
 Enter same passphrase again:
 Your identification has been saved in \(ti/.ssh/id_rsa.
 Your public key has been saved in \(ti/.ssh/id_rsa.pub.
 \&...
 .VERBOFF
 .DE
 By default it will create the public key at \f(CW\(ti/.ssh/id_rsa.pub\fP.
 Then add the newly created key to the authorized keys, so you can
 connect to other nodes of the Cobi cluster:
 .DS I
 .VERBON
 xeon07$ cat \(ti/.ssh/id_rsa.pub >> \(ti/.ssh/authorized_keys
 .VERBOFF
 .DE
 Finally, delete the old "cluster" key:
 .DS I
 .VERBON
 xeon07$ rm \(ti/.ssh/cluster \(ti/.ssh/cluster.pub
 .VERBOFF
 .DE
 And remove the section in the configuration \f(CW\(ti/.ssh/config\fP
 where the key was assigned to be used in all hosts along with the
 \f(CWStrictHostKeyChecking=no\fP option. Remove the following lines (if
 they exist):
 .DS I
 .VERBON
 Host *
    IdentityFile \(ti/.ssh/cluster
    StrictHostKeyChecking=no
 .VERBOFF
 .DE
 By default, the SSH client already searchs for a keypair called
 \f(CW\(ti/.ssh/id_rsa\fP and \f(CW\(ti/.ssh/id_rsa.pub\fP, so there is
 no need to manually specify them.
 .P
 You should be able to access the login node with your new key by using:
 .DS I
 .VERBON
 xeon07$ ssh ssfhead
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 3 "Authorize access to the repository"
 .P
 The sources of BSC packages are usually downloaded directly from the PM
 git server, so you must be able to access all repositories without a
 password prompt.
 .P
 Most repositories are open to read for logged in users, but there are
 some exceptions (for example the nanos6 repository) where you must have
 explicitly granted read access.
 .P
 Copy the contents of your public SSH key in \f(CW\(ti/.ssh/id_rsa.pub\fP
 and paste it in GitLab at
 .DS I
 .VERBON
 https://pm.bsc.es/gitlab/profile/keys
 .VERBOFF
 .DE
 Finally verify the SSH connection to the server works and you get a 
 greeting from the GitLab server with your username:
 .DS I
 .VERBON
 xeon07$ ssh git@bscpm03.bsc.es
 PTY allocation request failed on channel 0
 Welcome to GitLab, @rarias!
 Connection to bscpm03.bsc.es closed.
 .VERBOFF
 .DE
 Verify that you can access the nanos6 repository (otherwise you 
 first need to ask to be granted read access), at:
 .DS I
 .VERBON
 https://pm.bsc.es/gitlab/nanos6/nanos6
 .VERBOFF
 .DE
 Finally, you should be able to download the nanos6 git 
 repository without any password interaction by running:
 .DS I
 .VERBON
 xeon07$ git clone git@bscpm03.bsc.es:nanos6/nanos6.git
 .VERBOFF
 .DE
 Which will create the nanos6 directory.
 .\" ===================================================================
 .H 3 "Authorize access to MareNostrum 4"
 You will also need to access MareNostrum 4 from the xeon07 machine, in 
 order to run experiments. Add the following lines to the 
 \f(CW\(ti/.ssh/config\fP file and set your user name:
 .DS I
 .VERBON
 Host mn0 mn1 mn2
    User <your user name in MN4>
 .VERBOFF
 .DE
 Then copy your SSH key to MareNostrum 4 (it will ask you for your login
 password):
 .DS I
 .VERBON
 xeon07$ ssh-copy-id -i \(ti/.ssh/id_rsa.pub mn1
 .VERBOFF
 .DE
 Finally, ensure that you can connect without a password:
 .DS I
 .VERBON
 xeon07$ ssh mn1
 \&...
 login1$
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 3 "Clone the bscpkgs repository"
 .P
 Once you have Internet and you have granted access to the PM GitLab 
 repositories you can begin building software with nix. First ensure 
 that the nix binaries are available from your shell in xeon07:
 .DS I
 .VERBON
 xeon07$ nix --version
 nix (Nix) 2.3.6
 .VERBOFF
 .DE
 Now you are ready to build and install packages with nix. Clone the 
 bscpkgs repository:
 .DS I
 .VERBON
 xeon07$ git clone git@bscpm03.bsc.es:rarias/bscpkgs.git
 .VERBOFF
 .DE
 Nix looks in the current folder for a file named \f(CWdefault.nix\fP for
 packages, so go to the bscpkgs directory:
 .DS I
 .VERBON
 xeon07$ cd bscpkgs
 .VERBOFF
 .DE
 Now you should be able to build nanos6 (which is probably already
 compiled):
 .DS I
 .VERBON
 xeon07$ nix-build -A bsc.nanos6
 \&...
 /nix/store/...2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32
 .VERBOFF
 .DE
 The installation is placed in the nix store (with the path stated in 
 the last line of the build process), with the \f(CWresult\fP symbolic
 link pointing to the same location:
 .DS I
 .VERBON
 xeon07$ readlink result
 /nix/store/...2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 2 "Configure the login and target (MareNostrum 4)"
 .P
 In order to execute the programs in MareNostrum 4, you first need load
 some utilities in the PATH. Add to the end of the file
 \f(CW\(ti/.bashrc\fP in MareNostrum 4 the following line:
 .DS I
 .VERBON
 export PATH=/gpfs/projects/bsc15/nix/bin:$PATH
 .VERBOFF
 .DE
 Then logout and login again (our source the \f(CW\(ti/.bashrc\fP file)
 and check that now you have the \f(CWnix-develop\fP command available:
 .DS I
 .VERBON
 login1$ which nix-develop
 /gpfs/projects/bsc15/nix/bin/nix-develop
 .VERBOFF
 .DE
 The new utilities are available both in the login nodes and in the
 compute (target) nodes, as they share the file system over the network.
 .\" ===================================================================
 .H 1 "Overview"
 .P
 The garlic framework is designed to fulfill all the requirements of an
 experimenter in all the steps up to publication. The experience gained
 while using it suggests that we move along three stages despicted in the
 following diagram:
 .DS CB
 .S 9p 10p
 .PS 5
 linewid=1;
 right
 box "Source" "code"
 arrow "Development" above
 box "Program"
 arrow "Experiment" above
 box "Results"
 arrow "Data" "exploration"
 box "Figures"
 .PE
 .S P P
 .DE
 In the development phase the experimenter changes the source code in
 order to introduce new features or fix bugs. Once the program is
 considered functional, the next phase is the experimentation, where
 several experiment configurations are tested to evaluate the program. It
 is common that some problems are spotted during this phase, which lead
 the experimenter to go back to the development phase and change the
 source code.
 .P
 Finally, when the experiment is considered completed, the
 experimenter moves to the next phase, which envolves the exploration of
 the data generated by the experiment. During this phase, it is common to
 generate results in the form of plots or tables which provide a clear
 insight in those quantities of interest. It is also common that after
 looking at the figures, some changes in the experiment configuration
 need to be introduced (or even in the source code of the program).
 .P
 Therefore, the experimenter may move forward and backwards along three
 phases several times. The garlic framework provides support for all the
 three stages (with different degrees of madurity).
 .H 1 "Development (work in progress)"
 .P
 During the development phase, a functional program is produced by
 modifying its source code. This process is generally cyclic: the
 developer needs to compile, debug and correct mistakes. We want to
 minimize the delay times, so the programs can be executed as soon as
 needed, but under a controlled environment so that the same behavior
 occurs during the experimentation phase.
 .P
 In particular, we want that several developers can reproduce the
 the same development environment so they can debug each other programs
 when reporting bugs. Therefore, the environment must be carefully
 controlled to avoid non-reproducible scenarios.
 .P
 The current development environment provides an isolated shell with a
 clean environment, which runs in a new mount namespace where access to
 the filesystem is restricted. Only the project directory and the nix
 store are available (with some other exceptions), to ensure that you
 cannot accidentally link with the wrong library or modify the build
 process with a forgotten environment variable in the \f(CW\(ti/.bashrc\fP
 file.
 .\" ===================================================================
 .H 2 "Getting the development tools"
 .P
 To create a development
 environment, first copy or download the sources of your program (not the
 dependencies) in a new directory placed in the target machine
 (MareNostrum\~4).
 .P
 The default environment contains packages commonly used to develop
 programs, listed in the \fIgarlic/index.nix\fP file:
 .\" FIXME: Unify garlic.unsafeDevelop in garlic.develop, so we can
 .\" specify the packages directly
 .DS I
 .VERBON
 develop = let 
  commonPackages = with self; [
    coreutils htop procps-ng vim which strace
    tmux gdb kakoune universal-ctags bashInteractive
    glibcLocales ncurses git screen curl
    # Add more nixpkgs packages here...
  ];  
  bscPackages = with bsc; [
    slurm clangOmpss2 icc mcxx perf tampi impi
    # Add more bsc packages here...
  ];
  ...
 .VERBOFF
 .DE
 If you need additional packages, add them to the list, so that they
 become available in the environment. Those may include any dependency
 required to build your program.
 .P
 Then use the build machine (xeon07) to build the
 .I garlic.develop
 derivation:
 .DS I
 .VERBON
 build% nix-build -A garlic.develop
 \&...
 build% grep ln result
 ln -fs /gpfs/projects/.../bin/stage1 .nix-develop
 .VERBOFF
 .DE
 Copy the \fIln\fP command and run it in the target machine
 (MareNostrum\~4), inside the new directory used for your program
 development, to create the link \fI.nix-develop\fP (which is used to
 remember your environment). Several environments can be stored in
 different directories using this method, with different packages in each
 environment. You will need
 to rebuild the
 .I garlic.develop
 derivation and update the
 .I .nix-develop
 link after the package list is changed. Once the
 environment link is created, there is no need to repeat these steps again.
 .P
 Before entering the environment, you will need to access the required
 resources for your program, which may include several compute nodes.
 .\" ===================================================================
 .H 2 "Allocating resources for development"
 .P
 Our target machine (MareNostrum 4) provides an interactive shell, that
 can be requested with the number of computational resources required for
 development. To do so, connect to the login node and allocate an
 interactive session:
 .DS I
 .VERBON
 % ssh mn1
 login% salloc ...
 target%
 .VERBOFF
 .DE
 This operation may take some minutes to complete depending on the load
 of the cluster. But once the session is ready, any subsequent execution
 of programs will be immediate.
 .\" ===================================================================
 .H 2 "Accessing the developement environment"
 .P
 The utility program \fInix-develop\fP has been designed to access the
 development environment of the current directory, by looking for the
 \fI.nix-develop\fP file. It creates a namespace where the required
 packages are installed and ready to be used. Now you can access the
 newly created environment by running:
 .DS I
 .VERBON
 target% nix-develop
 develop%
 .VERBOFF
 .DE
 The spawned shell contains all the packages pre-defined in the
 \fIgarlic.develop\fP derivation, and can now be accessed by typing the
 name of the commands.
 .DS I
 .VERBON
 develop% which gcc
 /nix/store/azayfhqyg9...s8aqfmy-gcc-wrapper-9.3.0/bin/gcc
 develop% which gdb
 /nix/store/1c833b2y8j...pnjn2nv9d46zv44dk-gdb-9.2/bin/gdb
 .VERBOFF
 .DE
 If you need additional packages, you can add them in the
 \fIgarlic/index.nix\fP file as mentioned previously. To keep the
 same current resources, so you don't need to wait again for the
 resources to be allocated, exit only from the development shell:
 .DS I
 .VERBON
 develop% exit
 target%
 .VERBOFF
 .DE
 Then update the
 .I .nix-develop
 link and enter into the new develop environment:
 .DS I
 .VERBON
 target% nix-develop
 develop%
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 2 "Execution"
 The allocated shell can only execute tasks in the current node, which
 may be enough for some tests. To do so, you can directly run your
 program as:
 .DS I
 .VERBON
 develop$ ./program
 .VERBOFF
 .DE
 If you need to run a multi-node program, typically using MPI
 communications, then you can do so by using srun. Notice that you need
 to allocate several nodes when calling salloc previously. The srun
 command will execute the given program \fBoutside\fP the development
 environment if executed as-is. So we re-enter the develop environment by
 calling nix-develop as a wrapper of the program:
 .\" FIXME: wrap srun to reenter the develop environment by its own
 .DS I
 .VERBON
 develop$ srun nix-develop ./program
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 2 "Debugging"
 The debugger can be used to directly execute the program if is executed
 in only one node by using:
 .DS I
 .VERBON
 develop$ gdb ./program
 .VERBOFF
 .DE
 Or it can be attached to an already running program by using its PID.
 You will need to first connect to the node running it (say target2), and
 run gdb inside the nix-develop environment. Use
 .I squeue
 to see the compute nodes running your program: 
 .DS I
 .VERBON
 login$ ssh target2
 target2$ cd project-develop
 target2$ nix-develop
 develop$ gdb -p $pid
 .VERBOFF
 .DE
 You can repeat this step to control the execution of programs running in
 different nodes simultaneously.
 .P
 In those cases where the program crashes before being able to attach the
 debugger, enable the generation of core dumps:
 .DS I
 .VERBON
 develop$ ulimit -c unlimited
 .VERBOFF
 .DE
 And rerun the program, which will generate a core file that can be
 opened by gdb and contains the state of the memory when the crash
 happened. Beware that the core dump file can be very large, depending on
 the memory used by your program at the crash.
 .H 2 "Git branch name convention"
 .P
 The garlic benchmark imposes a set of requirements to be meet for each 
 application in order to coordinate the execution of the benchmark and 
 the gathering process of the results.
 .P
 Each application must be available in a git repository so it can be 
 included into the garlic benchmark. The different combinations of 
 programming models and communication schemes should be each placed in 
 one git branch, which are referred to as \fIbenchmark branches\fP. At
 least one benchmark branch should exist and they all must begin with the
 prefix \f(CWgarlic/\fP (other branches will be ignored).
 .P
 The branch name is formed by adding keywords separated by the "+" 
 character. The keywords must follow the given order and can only 
 appear zero or once each. At least one keyword must be included. The 
 following keywords are available:
 .LB 12 2 0 0
 .LI \f(CWmpi\fP
 A significant fraction of the communications uses only the standard MPI
 (without extensions like TAMPI).
 .LI \f(CWtampi\fP
 A significant fraction of the communications uses TAMPI.
 .LI \f(CWsend\fP
 A significant part of the MPI communication uses the blocking family of
 methods (MPI_Send, MPI_Recv, MPI_Gather...).
 .LI \f(CWisend\fP
 A significant part of the MPI communication uses the non-blocking family
 of methods (MPI_Isend, MPI_Irecv, MPI_Igather...).
 .LI \f(CWrma\fP
 A significant part of the MPI communication uses remote memory access
 (one-sided) methods (MPI_Get, MPI_Put...).
 .LI \f(CWseq\fP
 The complete execution is sequential in each process (one thread per
 process).
 .LI \f(CWomp\fP
 A significant fraction of the execution uses the OpenMP programming
 model.
 .LI \f(CWoss\fP
 A significant fraction of the execution uses the OmpSs-2 programming
 model.
 .LI \f(CWtask\fP
 A significant part of the execution involves the use of the tasking
 model.
 .LI \f(CWtaskfor\fP
 A significant part of the execution uses the taskfor construct.
 .LI \f(CWfork\fP
 A significant part of the execution uses the fork-join model (including
 hybrid programming techniques with  parallel computations and sequential
 communications).
 .LI \f(CWsimd\fP
 A significant part of the computation has been optimized to use SIMD
 instructions.
 .LE
 .P
 In the \fBAppendix A\fP there is a flowchart to help the decision
 process of the branch name.
 .P
 Additional user defined keywords may be added at the end using the 
 separator "+" as well. User keywords must consist of capital 
 alphanumeric characters only and be kept short. These additional 
 keywords must be different (case insensitive) to the already defined 
 above. Some examples:
 .DS I
 .VERBON
 garlic/mpi+send+seq
 garlic/mpi+send+omp+fork
 garlic/mpi+isend+oss+task
 garlic/tampi+isend+oss+task
 garlic/tampi+isend+oss+task+COLOR
 garlic/tampi+isend+oss+task+COLOR+BTREE
 .VERBOFF
 .DE
 .\" ===================================================================
 .H 1 "Experimentation"
 The experimentation phase begins with a functional program which is the
 object of study. The experimenter then designs an experiment aimed at
 measuring some properties of the program. The experiment is then
 executed and the results are stored for further analysis.
 .H 2 "Writing the experiment configuration"
 .P
 The term experiment is quite overloaded in this document. We are going
 to see how to write the recipe that describes the execution pipeline of
 an experiment.
 .P
 Within the garlic benchmark, experiments are typically sorted by a
 hierarchy depending on which application they belong. Take a look at the
 \fCgarlic/exp\fP directory and you will find some folders and .nix
 files.
 .P
 Each of those recipes files describe a function that returns a
 derivation, which, once built will result in the first stage script of
 the execution pipeline.
 .P
 The first part of states the name of the attributes required as the
 input of the function. Typically some packages, common tools and options:
 .DS I
 .VERBON
 {
  stdenv
 , stdexp
 , bsc
 , targetMachine
 , stages
 , garlicTools
 }:
 .VERBOFF
 .DE
 .P
 Notice the \fCtargetMachine\fP argument, which provides information
 about the machine in which the experiment will run. You should write
 your experiment in such a way that runs in multiple clusters.
 .DS I
 .VERBON
 varConf = {
  blocks = [ 1 2 4 ];
  nodes = [ 1 ];
 };
 .VERBOFF
 .DE
 .P
 The \fCvarConf\fP is the attribute set that allows you to vary some
 factors in the experiment.
 .DS I
 .VERBON
 genConf = var: fix (self: targetMachine.config // {
  expName = "example";
  unitName = self.expName + "-b" + toString self.blocks;
  blocks = var.blocks;
  nodes = var.nodes;
  cpusPerTask = 1;
  tasksPerNode = self.hw.socketsPerNode;
 });
 .VERBOFF
 .DE
 .P
 The \fCgenConf\fP function is the central part of the description of the
 experiment. Takes as input \fBone\fP configuration from the cartesian
 product of
 .I varConfig
 and returns the complete configuration. In our case, it will be
 called 3 times, with the following inputs at each time:
 .DS I
 .VERBON
 { blocks = 1; nodes = 1; }
 { blocks = 2; nodes = 1; }
 { blocks = 4; nodes = 1; }
 .VERBOFF
 .DE
 .P
 The return value can be inspected by calling the function in the
 interactive nix repl:
 .DS I
 .VERBON
 nix-repl> genConf { blocks = 2; nodes = 1; }
 {
  blocks = 2;
  cpusPerTask = 1;
  expName = "example";
  hw = { ... };
  march = "skylake-avx512";
  mtune = "skylake-avx512";
  name = "mn4";
  nixPrefix = "/gpfs/projects/bsc15/nix";
  nodes = 1;
  sshHost = "mn1";
  tasksPerNode = 2;
  unitName = "example-b2";
 }
 .VERBOFF
 .DE
 .P
 Some configuration parameters were added by
 .I targetMachine.config ,
 such as the
 .I nixPrefix ,
 .I sshHost
 or the
 .I hw
 attribute set, which are specific for the cluster they experiment is
 going to run. Also, the
 .I unitName
 got assigned the proper name based on the number of blocks, but the
 number of tasks per node were assigned based on the hardware description
 of the target machine.
 .P
 By following this rule, the experiments can easily be ported to machines
 with other hardware characteristics, and we only need to define the
 hardware details once. Then all the experiments will be updated based on
 those details.
 .H 2 "First steps"
 .P
 The complete results generally take a long time to be finished, so it is
 advisable to design the experiments iteratively, in order to quickly
 obtain some feedback. Some recommendations:
 .BL
 .LI
 Start with one unit only.
 .LI
 Set the number of runs low (say 5) but more than one.
 .LI
 Use a small problem size, so the execution time is low.
 .LI
 Set the time limit low, so deadlocks are caught early.
 .LE
 .P
 As soon as the first runs are complete, examine the results and test
 that everything looks good. You would likely want to check:
 .BL
 .LI
 The resources where assigned as intended (nodes and CPU affinity).
 .LI
 No errors or warnings: look at stderr and stdout logs.
 .LI
 If a deadlock happens, it will run out of the time limit.
 .LE
 .P
 As you gain confidence over that the execution went as planned, begin
 increasing the problem size, the number of runs, the time limit and
 lastly the number of units. The rationale is that each unit that is
 shared among experiments gets assigned the same hash. Therefore, you can
 iteratively add more units to an experiment, and if they are already
 executed (and the results were generated) is reused.
 .SK
 .APP "" "Branch name diagram"
 .DS CB
 .S -3 10
 .PS 4.4/25.4
 copy "gitbranch.pic"
 .PE
 .S P P
 .DE
 .TC
--- a/garlic/doc/ug.ms
+++ b/garlic/doc/ug.ms