WIP: postprocessing doc

2020-11-04 12:56:35 +01:00
parent 62c9da2474
commit f0122d557f
2 changed files with 96 additions and 37 deletions
--- a/garlic/doc/Makefile
+++ b/garlic/doc/Makefile
@@ -1,8 +1,14 @@
-all: execution.pdf execution.txt pp.pdf pp.txt
+all: execution.pdf execution.ascii pp.pdf pp.ascii
+
+TTYOPT=-rPO=4m -rLL=72m
+#TTYOPT=-dpaper=a0 -rPO=4m -rLL=72m

 %.pdf: %.ms
-	groff -ms -t -p -Tpdf $^ > $@
+	REFER=ref.i groff -ms -t -p -R -Tpdf $^ > $@
 	-killall -HUP mupdf

-%.txt: %.ms
-	groff -ms -t -p -Tutf8 $^ > $@
+%.utf8: %.ms
+	REFER=ref.i groff -ms -t -p -R $(TTYOPT) -Tutf8 $^ > $@
+
+%.ascii: %.ms
+	REFER=ref.i groff -ms -t -p -R $(TTYOPT) -Tascii $^ > $@
--- a/garlic/doc/pp.ms
+++ b/garlic/doc/pp.ms
@@ -1,75 +1,128 @@
 .TL
-Garlic: experiment results
+Garlic: the post-processing pipeline
 .AU
 Rodrigo Arias Mallo
 .AI
 Barcelona Supercomputing Center
+.AB
+.LP
+In this document the stages posterior to the execution of the experiment
+are explained. We consider the post-processing pipeline the steps to go
+from the generated data from the experiment to a set of plots or tables
+that present the data in a human readable form.
+.AE
 .\"#####################################################################
 .nr GROWPS 3
 .nr PSINCR 1.5p
 .\".nr PD 0.5m
 .nr PI 2m
-\".2C
+.\".2C
+.R1
+bracket-label " [" ] ", "
+accumulate
+.R2
 .\"#####################################################################
+.NH 1
+Introduction
+.LP
+After the correct execution of an experiment some measurements are
+recorded in the results for further investigation. Typically the time of
+the execution is measured and presented later in a plot or a table. The
+steps to analyze the results and present them in a convenient way is
+called the
+.I "post-processing pipeline" .
+Similarly to the execution pipeline
+.[
+garlic execution
+.]
+where several stages run sequentially, the
+post-processing pipeline is also formed by multiple stages executed in
+order.
+.PP
+The rationale behind dividing execution and post-processing is
+that usually the experiments are costly to run (they take a long time to
+complete) while generating a plot is usually shorter. Refining the plots
+multiple times reusing the same experimental results doesn't require the
+execution of the complete experiment, so the experimenter can try
+multiple ways to present the data in a rapid cycle.
+.NH 1
+Fetching the results
 .LP
 Consider a program of interest for which an experiment has been designed to
-measure some properties. When the experiment is executed, it will generate some
-results which are generally non-deterministic. The experimenter may want to
-present some information in a visual plot or graph based on these results.
-.PP
-In this escenario, the experiment depends on the program\[em]any
-changes in the program will cause nix to build the experiment again using the
-updated program. The results will also depend on the experiment, and
-the graph on the results. This chain of dependencies can be shown in
-the following dependency tree:
+measure some properties that the experimenter wants to present in a
+visual plot. When the experiment is launched, the execution
+pipeline (EP) is completely executed and it will generate some
+results. In this escenario, the execution pipeline depends on the
+program\[em]any changes in the program will cause nix to build it again
+using the updated program. The results will also depend on the
+execution pipeline, and the graph on the results. This chain of
+dependencies can be shown in the following dependency graph:
+.\"circlerad=0.22; arrowhead=7;
 .PS
 right
-circlerad=0.22; arrowhead=7;
 circle "Prog"
 arrow
-circle "Exp"
+circle "EP"
 arrow
 circle "Result"
 arrow
-circle "Graph"
+circle "PP"
+arrow
+circle "Plot"
 .PE
 Ideally, the dependencies should be handled by nix, so it can detect any
 change and rebuild the necessary parts automatically. Unfortunately, nix
-is not able to build R as a derivation directly as it requires access
+is not able to build the result as a derivation directly as it requires access
 to the
 .I "target cluster"
-with several user accounts. In addition, the results are often
-non-deterministic so the graph G cannot depend on the content of the
-results.
-.PP
-In order to let several users use the results from a cache, we use the
+with several user accounts. In order to let several users reuse the same results from a cache, we
+use the
 .I "nix store"
-to make them available for read only. To generate the results from the
+to make them available. To generate the results from the
 experiment, we add some extra steps that must be executed manually.
 .PS
 right
 circlerad=0.22; arrowhead=7;
 circle "Prog"
 arrow
-E: circle "Exp"
-RUN: circle "Run" at E + (0.8,-0.5)
-FETCH: circle "Fetch" at E + (1.6,-0.5)
+E: circle "EP"
+RUN: circle "Run" at E + (0.8,-0.5) dashed
+FETCH: circle "Fetch" at E + (1.6,-0.5) dashed
 R: circle "Result" at E + (2.4,0)
 arrow
-G: circle "Graph"
+P: circle "PP"
+arrow
+circle "Plot"
 arrow dashed from E to RUN chop
 arrow dashed from RUN to FETCH chop
 arrow dashed from FETCH to R chop
 arrow from E to R chop
 .PE
 The run and fetch steps are provided by the helper tool
-.I garlic ,
-which launches the experiment using the user credential at the
+.I "garlic(1)" ,
+which launches the experiment using the user credentials at the
 .I "target cluster"
 and then fetches the results, placing them in a directory known by nix.
-Is the directory is not found, nix will issue a message to suggest the
-user to launch the experiment and it will fail to build the result
-derivation. When the result is successfully built by any user, the
-derivation won't need to be rebuilt again until the experiment changes,
-as the hash only depends on the experiment and not on the contents of
-the results.
+When the result derivation needs to be built, nix will look in this
+directory for the results of the execution. If the directory is not
+found, a message is printed to suggest the user to launch the
+experiment and the build process is stopped. When the
+result is successfully built by any user, is stored in the
+.I "nix store"
+and it won't need to be rebuilt again until the experiment changes, as
+the hash only depends on the experiment and not on the contents of the
+results.
+.PP
+Notice that this mechanism violates the deterministic nature of the nix
+store, as from a given input (the experiment) we can generate different
+outputs (each result from different executions). We knowingly relaxed
+this restriction by providing a guarantee that the results are
+equivalent and there is no need to execute an experiment more than once.
+.PP
+To force the execution of an experiment you can use the
+.I rev
+attribute which is a number assigned to each experiment
+and can be incremented to create copies that only differs on that
+number. The experiment hash will change but the experiment will be the
+same, as long as the revision number is ignored along the execution
+stages.