user guide: expand the develop section
This commit is contained in:
parent
95809bd2bf
commit
60cab85fc4
@ -1,5 +1,5 @@
|
|||||||
all: execution.pdf execution.utf8 execution.ascii pp.pdf pp.utf8 pp.ascii\
|
all: execution.pdf execution.utf8 execution.ascii pp.pdf pp.utf8 pp.ascii\
|
||||||
branch.pdf blackbox.pdf
|
branch.pdf blackbox.pdf ug.mm.pdf
|
||||||
|
|
||||||
TTYOPT=-rPO=4m -rLL=72m
|
TTYOPT=-rPO=4m -rLL=72m
|
||||||
PDFOPT=-dpaper=a4 -rPO=4c -rLL=13c
|
PDFOPT=-dpaper=a4 -rPO=4c -rLL=13c
|
||||||
@ -7,6 +7,9 @@ PREPROC=-k -t -p -R
|
|||||||
|
|
||||||
blackbox.pdf: blackbox.ms Makefile
|
blackbox.pdf: blackbox.ms Makefile
|
||||||
REFER=ref.i groff -ms $(PREPROC) -dpaper=a4 -rPO=2c -rLL=17c -Tpdf $< > $@
|
REFER=ref.i groff -ms $(PREPROC) -dpaper=a4 -rPO=2c -rLL=17c -Tpdf $< > $@
|
||||||
|
|
||||||
|
%.mm.pdf: %.mm Makefile
|
||||||
|
groff -mm $(PREPROC) -Tpdf $< > $@
|
||||||
-killall -HUP mupdf
|
-killall -HUP mupdf
|
||||||
|
|
||||||
%.pdf: %.ms Makefile
|
%.pdf: %.ms Makefile
|
||||||
|
187
garlic/doc/ug.mm
187
garlic/doc/ug.mm
@ -1,3 +1,5 @@
|
|||||||
|
\"Header point size
|
||||||
|
.ds HP "15 12 12 0 0 0 0 0 0 0 0 0 0 0"
|
||||||
.COVER
|
.COVER
|
||||||
.TL
|
.TL
|
||||||
Garlic: User guide
|
Garlic: User guide
|
||||||
@ -5,29 +7,183 @@ Garlic: User guide
|
|||||||
.AU "Rodrigo Arias Mallo"
|
.AU "Rodrigo Arias Mallo"
|
||||||
.COVEND
|
.COVEND
|
||||||
.H 1 "Overview"
|
.H 1 "Overview"
|
||||||
Dependency graph of a complete experiment that produces a figure. Each box
|
.P
|
||||||
is a derivation and arrows represent \fBbuild dependencies\fP.
|
The garlic framework is designed to fulfill all the requirements of an
|
||||||
|
experimenter in all the steps up to publication. The experience gained
|
||||||
|
while using it suggests that we move along three stages despicted in the
|
||||||
|
following diagram:
|
||||||
.DS CB
|
.DS CB
|
||||||
.PS
|
.PS
|
||||||
linewid=0.9;
|
linewid=0.9;
|
||||||
right
|
right
|
||||||
box "Source" "code"
|
box "Source" "code"
|
||||||
arrow <-> "Develop" above
|
arrow "Development" above
|
||||||
box "Program"
|
box "Program"
|
||||||
arrow <-> "Experiment" above
|
arrow "Experiment" above
|
||||||
box "Results"
|
box "Results"
|
||||||
arrow <-> "Data" "exploration"
|
arrow "Data" "exploration"
|
||||||
box "Figures"
|
box "Figures"
|
||||||
.PE
|
.PE
|
||||||
.DE
|
.DE
|
||||||
.H 1 "Development"
|
In the development phase the experimenter changes the source code in
|
||||||
|
order to introduce new features or fix bugs. Once the program is
|
||||||
|
considered functional, the next phase is the experimentation, where
|
||||||
|
several experiment configurations are tested to evaluate the program. It
|
||||||
|
is common that some problems are spotted during this phase, which lead
|
||||||
|
the experimenter to go back to the development phase and change the
|
||||||
|
source code.
|
||||||
.P
|
.P
|
||||||
The development phase consists in creating a functional program by
|
Finally, when the experiment is considered completed, the
|
||||||
modifying the source code. This process is generally cyclic, where the
|
experimenter moves to the next phase, which envolves the exploration of
|
||||||
developer needs to compile the program, correct mistakes and debug the
|
the data generated by the experiment. During this phase, it is common to
|
||||||
program.
|
generate results in the form of plots or tables which provide a clear
|
||||||
|
insight in those quantities of interest. It is also common that after
|
||||||
|
looking at the figures, some changes in the experiment configuration
|
||||||
|
need to be introduced (or even in the source code of the program).
|
||||||
.P
|
.P
|
||||||
It requires to be running in the target machine.
|
Therefore, the experimenter may move forward and backwards along three
|
||||||
|
phases several times. The garlic framework provides support for all the
|
||||||
|
three stages (with different degrees of madurity).
|
||||||
|
.H 1 "Development (work in progress)"
|
||||||
|
.P
|
||||||
|
During the development phase, a functional program is produced by
|
||||||
|
modifying its source code. This process is generally cyclic: the
|
||||||
|
developer needs to compile, debug and correct mistakes. We want to
|
||||||
|
minimize the delay times, so the programs can be executed as soon as
|
||||||
|
needed, but under a controlled environment so that the same behavior
|
||||||
|
occurs during the experimentation phase.
|
||||||
|
.P
|
||||||
|
The development phase is typically carried directly in the target
|
||||||
|
machine, so we need the resources first.
|
||||||
|
.H 2 "Allocating resources for development"
|
||||||
|
.P
|
||||||
|
Our target machine (MareNostrum 4) provides an interactive shell, that
|
||||||
|
can be requested with the number of computational resources required for
|
||||||
|
development.
|
||||||
|
.P
|
||||||
|
To do so, connect to it and allocate an interactive session:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
build% ssh target
|
||||||
|
target% salloc ...
|
||||||
|
compute%
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
This operation may take some minutes to complete depending on the load
|
||||||
|
of the cluster. But once the session is ready, any subsequent execution
|
||||||
|
will be immediate.
|
||||||
|
.H 2 "Getting the development tools"
|
||||||
|
.P
|
||||||
|
In order to get the same packages provided for the experiments, we can
|
||||||
|
use the \fInix-develop\fP utility, which creates a namespace where the
|
||||||
|
required packages are installed. Use the build machine to generate a
|
||||||
|
develop environment:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
build% nix-build -A garlic.develop
|
||||||
|
\&...
|
||||||
|
build% grep ln result
|
||||||
|
ln -fs /gpfs/projects/bsc15/nix/...olate/bin/stage1 .nix-develop
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
Copy the \fIln\fP command and run it in the target machine, in a new
|
||||||
|
directory used for your program development. The link will be placed in
|
||||||
|
a hidden file named \fI.nix-develop\fP and will be used to remember your
|
||||||
|
environment. Several environments can be stored using this method, with
|
||||||
|
different packages on each.
|
||||||
|
.P
|
||||||
|
Now you can access the newly created environment by running:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
compute% nix-develop
|
||||||
|
develop%
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
The spawned shell contains all the packages pre-defined in the
|
||||||
|
\fIgarlic.develop\fP derivation, and can now be accessed by typing the
|
||||||
|
name of the commands.
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
develop$ which gcc
|
||||||
|
/nix/store/azayfhqyg9...s8aqfmy-gcc-wrapper-9.3.0/bin/gcc
|
||||||
|
develop$ which gdb
|
||||||
|
/nix/store/1c833b2y8j...pnjn2nv9d46zv44dk-gdb-9.2/bin/gdb
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
If you need additional packages, you can add them in the
|
||||||
|
\fIgarlic/index.nix\fP file:
|
||||||
|
.\" FIXME: Unify garlic.unsafeDevelop in garlic.develop, so we can
|
||||||
|
.\" specify the packages directly
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
unsafeDevelop = callPackage ./develop/default.nix {
|
||||||
|
extraInputs = with self; [
|
||||||
|
coreutils htop procps-ng vim which strace
|
||||||
|
tmux gdb kakoune universal-ctags bashInteractive
|
||||||
|
glibcLocales ncurses git screen curl
|
||||||
|
# Add more nixpkgs packages here...
|
||||||
|
bsc.slurm bsc.clangOmpss2 bsc.icc bsc.mcxx bsc.perf
|
||||||
|
# Add more bscpkgs packages here...
|
||||||
|
];
|
||||||
|
};
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
Then re-execute the steps again, to build the new develop environment.
|
||||||
|
.H 2 "Execution"
|
||||||
|
The allocated shell can only execute tasks in the current node, which
|
||||||
|
may be enough for some tests. To do so, you can directly run your
|
||||||
|
program as:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
develop$ ./program
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
If you need to run a multi-node program, typically using MPI
|
||||||
|
communications, then you can do so by using srun. Notice that you need
|
||||||
|
to allocate several nodes when calling salloc previously. The srun
|
||||||
|
command will execute the given program \fBoutside\fP the develop
|
||||||
|
environment if executed as-is. So we re-enter the develop environment by
|
||||||
|
calling nix-develop as a wrapper of the program:
|
||||||
|
.\" FIXME: wrap srun to reenter the develop environment by its own
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
develop$ srun nix-develop ./program
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
.H 2 "Debugging"
|
||||||
|
The debugger can be used to directly execute the program if is executed
|
||||||
|
in only one node by using:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
develop$ gdb ./program
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
Or it can be attached to an already running program by using its pid.
|
||||||
|
You will need to first connect to the node running it, and run gdb
|
||||||
|
inside the nix-develop environment. Use squeue to see the compute nodes
|
||||||
|
running your program:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
target$ ssh compute
|
||||||
|
compute$ cd project-develop
|
||||||
|
compute$ nix-develop
|
||||||
|
develop$ gdb -p $pid
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
You can repeat this step in other nodes to control the execution in
|
||||||
|
multiple nodes.
|
||||||
|
.P
|
||||||
|
In those cases where the program crashes before being able to attach the
|
||||||
|
debugger, you can enable the generation of core dumps:
|
||||||
|
.DS I
|
||||||
|
.VERBON
|
||||||
|
develop$ ulimit -c unlimited
|
||||||
|
.VERBOFF
|
||||||
|
.DE
|
||||||
|
And rerun the program, which will generate a core file that can be
|
||||||
|
opened by gdb and contains the state of the memory when the crash
|
||||||
|
happened. Beware that the core dump file can be very large, depending on
|
||||||
|
the memory used by your program at the crash.
|
||||||
.\" ===================================================================
|
.\" ===================================================================
|
||||||
.H 1 "Experimentation"
|
.H 1 "Experimentation"
|
||||||
The experimentation phase begins with a functional program which is the
|
The experimentation phase begins with a functional program which is the
|
||||||
@ -178,4 +334,13 @@ lastly the number of units. The rationale is that each unit that is
|
|||||||
shared among experiments gets assigned the same hash. Therefore, you can
|
shared among experiments gets assigned the same hash. Therefore, you can
|
||||||
iteratively add more units to an experiment, and if they are already
|
iteratively add more units to an experiment, and if they are already
|
||||||
executed (and the results were generated) is reused.
|
executed (and the results were generated) is reused.
|
||||||
|
.SK
|
||||||
|
.H 1 "Annex A: Branch name diagram"
|
||||||
|
.DS CB
|
||||||
|
.S -2
|
||||||
|
.PS 4.6/25.4
|
||||||
|
copy "gitbranch.pic"
|
||||||
|
.PE
|
||||||
|
.S P
|
||||||
|
.DE
|
||||||
.TC
|
.TC
|
||||||
|
Loading…
Reference in New Issue
Block a user