WIP: postprocessing doc

2020-11-04 12:56:35 +01:00
parent 62c9da2474
commit f0122d557f
2 changed files with 96 additions and 37 deletions
--- a/garlic/doc/Makefile
+++ b/garlic/doc/Makefile
@@ -1,8 +1,14 @@
-all: execution.pdf execution.txt pp.pdf pp.txt
+all: execution.pdf execution.ascii pp.pdf pp.ascii
 TTYOPT=-rPO=4m -rLL=72m
 #TTYOPT=-dpaper=a0 -rPO=4m -rLL=72m
 %.pdf: %.ms
-	groff -ms -t -p -Tpdf $^ > $@
+	REFER=ref.i groff -ms -t -p -R -Tpdf $^ > $@
 	-killall -HUP mupdf
-%.txt: %.ms
+%.utf8: %.ms
-	groff -ms -t -p -Tutf8 $^ > $@
+	REFER=ref.i groff -ms -t -p -R $(TTYOPT) -Tutf8 $^ > $@
 %.ascii: %.ms
 	REFER=ref.i groff -ms -t -p -R $(TTYOPT) -Tascii $^ > $@
--- a/garlic/doc/pp.ms
+++ b/garlic/doc/pp.ms
@@ -1,75 +1,128 @@
 .TL
-Garlic: experiment results
+Garlic: the post-processing pipeline
 .AU
 Rodrigo Arias Mallo
 .AI
 Barcelona Supercomputing Center
 .AB
 .LP
 In this document the stages posterior to the execution of the experiment
 are explained. We consider the post-processing pipeline the steps to go
 from the generated data from the experiment to a set of plots or tables
 that present the data in a human readable form.
 .AE
 .\"#####################################################################
 .nr GROWPS 3
 .nr PSINCR 1.5p
 .\".nr PD 0.5m
 .nr PI 2m
-\".2C
+.\".2C
 .R1
 bracket-label " [" ] ", "
 accumulate
 .R2
 .\"#####################################################################
 .NH 1
 Introduction
 .LP
 After the correct execution of an experiment some measurements are
 recorded in the results for further investigation. Typically the time of
 the execution is measured and presented later in a plot or a table. The
 steps to analyze the results and present them in a convenient way is
 called the
 .I "post-processing pipeline" .
 Similarly to the execution pipeline
 .[
 garlic execution
 .]
 where several stages run sequentially, the
 post-processing pipeline is also formed by multiple stages executed in
 order.
 .PP
 The rationale behind dividing execution and post-processing is
 that usually the experiments are costly to run (they take a long time to
 complete) while generating a plot is usually shorter. Refining the plots
 multiple times reusing the same experimental results doesn't require the
 execution of the complete experiment, so the experimenter can try
 multiple ways to present the data in a rapid cycle.
 .NH 1
 Fetching the results
 .LP
 Consider a program of interest for which an experiment has been designed to
-measure some properties. When the experiment is executed, it will generate some
+measure some properties that the experimenter wants to present in a
-results which are generally non-deterministic. The experimenter may want to
+visual plot. When the experiment is launched, the execution
-present some information in a visual plot or graph based on these results.
+pipeline (EP) is completely executed and it will generate some
-.PP
+results. In this escenario, the execution pipeline depends on the
-In this escenario, the experiment depends on the program\[em]any
+program\[em]any changes in the program will cause nix to build it again
-changes in the program will cause nix to build the experiment again using the
+using the updated program. The results will also depend on the
-updated program. The results will also depend on the experiment, and
+execution pipeline, and the graph on the results. This chain of
-the graph on the results. This chain of dependencies can be shown in
+dependencies can be shown in the following dependency graph:
-the following dependency tree:
+.\"circlerad=0.22; arrowhead=7;
 .PS
 right
 circlerad=0.22; arrowhead=7;
 circle "Prog"
 arrow
-circle "Exp"
+circle "EP"
 arrow
 circle "Result"
 arrow
-circle "Graph"
+circle "PP"
 arrow
 circle "Plot"
 .PE
 Ideally, the dependencies should be handled by nix, so it can detect any
 change and rebuild the necessary parts automatically. Unfortunately, nix
-is not able to build R as a derivation directly as it requires access
+is not able to build the result as a derivation directly as it requires access
 to the
 .I "target cluster"
-with several user accounts. In addition, the results are often
+with several user accounts. In order to let several users reuse the same results from a cache, we
-non-deterministic so the graph G cannot depend on the content of the
+use the
 results.
 .PP
 In order to let several users use the results from a cache, we use the
 .I "nix store"
-to make them available for read only. To generate the results from the
+to make them available. To generate the results from the
 experiment, we add some extra steps that must be executed manually.
 .PS
 right
 circlerad=0.22; arrowhead=7;
 circle "Prog"
 arrow
-E: circle "Exp"
+E: circle "EP"
-RUN: circle "Run" at E + (0.8,-0.5)
+RUN: circle "Run" at E + (0.8,-0.5) dashed
-FETCH: circle "Fetch" at E + (1.6,-0.5)
+FETCH: circle "Fetch" at E + (1.6,-0.5) dashed
 R: circle "Result" at E + (2.4,0)
 arrow
-G: circle "Graph"
+P: circle "PP"
 arrow
 circle "Plot"
 arrow dashed from E to RUN chop
 arrow dashed from RUN to FETCH chop
 arrow dashed from FETCH to R chop
 arrow from E to R chop
 .PE
 The run and fetch steps are provided by the helper tool
-.I garlic ,
+.I "garlic(1)" ,
-which launches the experiment using the user credential at the
+which launches the experiment using the user credentials at the
 .I "target cluster"
 and then fetches the results, placing them in a directory known by nix.
-Is the directory is not found, nix will issue a message to suggest the
+When the result derivation needs to be built, nix will look in this
-user to launch the experiment and it will fail to build the result
+directory for the results of the execution. If the directory is not
-derivation. When the result is successfully built by any user, the
+found, a message is printed to suggest the user to launch the
-derivation won't need to be rebuilt again until the experiment changes,
+experiment and the build process is stopped. When the
-as the hash only depends on the experiment and not on the contents of
+result is successfully built by any user, is stored in the
-the results.
+.I "nix store"
 and it won't need to be rebuilt again until the experiment changes, as
 the hash only depends on the experiment and not on the contents of the
 results.
 .PP
 Notice that this mechanism violates the deterministic nature of the nix
 store, as from a given input (the experiment) we can generate different
 outputs (each result from different executions). We knowingly relaxed
 this restriction by providing a guarantee that the results are
 equivalent and there is no need to execute an experiment more than once.
 .PP
 To force the execution of an experiment you can use the
 .I rev
 attribute which is a number assigned to each experiment
 and can be incremented to create copies that only differs on that
 number. The experiment hash will change but the experiment will be the
 same, as long as the revision number is ignored along the execution
 stages.