182 lines
4.5 KiB
XML
182 lines
4.5 KiB
XML
.\"usage: NS title
|
|
.EQ
|
|
delim $$
|
|
.EN
|
|
.de NS \" New Slide
|
|
.SK
|
|
.ev gp-top
|
|
.fam H
|
|
.vs 1.5m
|
|
.ll \\n[@ll]u
|
|
.lt \\n[@ll]u
|
|
.rs
|
|
.sp 2v
|
|
.ps +5
|
|
\\$*
|
|
.ps -5
|
|
.sp 1.5v
|
|
.br
|
|
.ev
|
|
..
|
|
.\" Remove headers
|
|
.de TP
|
|
..
|
|
.\" Bigger page number in footer
|
|
.de EOP
|
|
.fam H
|
|
.ps +2
|
|
. ie o .tl \\*[pg*odd-footer]
|
|
. el .tl \\*[pg*even-footer]
|
|
. ds hd*format \\g[P]
|
|
. af P 0
|
|
. ie (\\n[P]=1)&(\\n[N]=1) .tl \\*[pg*header]
|
|
. el .tl \\*[pg*footer]
|
|
. af P \\*[hd*format]
|
|
. tl ''\\*[Pg_type!\\n[@copy_type]]''
|
|
..
|
|
.\" Remove top and bottom margin
|
|
.VM 0 0
|
|
.\"
|
|
.\"
|
|
.\" Set virtual page dimensions for a physical size of 16x12 cm
|
|
.PGFORM 14c 12c 1c 1
|
|
.ND "January 14, 2021"
|
|
.\" .vs 1.5m
|
|
.S C 1.5m
|
|
.fam H
|
|
.\".PH "'cosas'''"
|
|
.COVER ms
|
|
.de cov@print-date
|
|
.DS C
|
|
.fam H
|
|
.B
|
|
\\*[cov*new-date]
|
|
.DE
|
|
..
|
|
.TL
|
|
.ps 20
|
|
.fam H
|
|
Garlic experiments
|
|
.AF "Barcelona Supercomputing Center"
|
|
.AU "Rodrigo Arias Mallo"
|
|
.COVEND
|
|
.PF "'''%'"
|
|
.\" Turn off justification
|
|
.SA 0
|
|
.\".PF '''%'
|
|
.\"==================================================================
|
|
.NS "Approach 1"
|
|
This was the approach proposed for hybrids PM
|
|
.BL
|
|
.LI
|
|
Perform a granularity experiment with a \fIreasonable\fP problem size.
|
|
.LI
|
|
Take the best blocksize
|
|
.LI
|
|
Analyze strong and weak scaling with that blocksize.
|
|
.LI
|
|
Plot speedup and efficiency comparing multiple PM.
|
|
.LE 1
|
|
The main problem is that it may lead to \fBbogus comparisons\fP.
|
|
Additionally, there is no guarantee that the best blocksize is the one
|
|
that performs better with more resources.
|
|
.\"==================================================================
|
|
.NS "Approach 2"
|
|
We want to measure scalability of the application \fBonly\fP, not mixed
|
|
with runtime overhead or lack of parallelism.
|
|
.P
|
|
We define \fBsaturation\fP as the state of an execution that allows a
|
|
program to potentially use all the resources (the name comes from the
|
|
transistor state, when current flows freely).
|
|
.P
|
|
Design a new experiment which tests multiple blocksizes and multiple
|
|
input sizes to find these states: \fBthe saturation experiment\fP.
|
|
.P
|
|
Begin with small problems and increase the size, so you get to the
|
|
answer quickly.
|
|
.\"==================================================================
|
|
.NS "Saturation experiment"
|
|
.2C
|
|
\X'pdf: pdfpic sat.png.tk.pdf -R 7c'
|
|
.NCOL
|
|
.S -1 -3
|
|
.BL 1m
|
|
.LI
|
|
The objetive is to find the minimum input size that allows us to get
|
|
meaningful scalability results.
|
|
.LI
|
|
More precisely, a unit is in \fBsaturation state\fP if the median time
|
|
is below the \fBsaturation time limit\fP, currently set to 110% the minimum
|
|
median time (red dashed lines).
|
|
.LI
|
|
An input size is in \fBsaturation zone\fP if it allows at least K=3
|
|
consecutive points in the saturation state.
|
|
.LI
|
|
With less than 512 particles/CPU (green line) we cannot be sure that the
|
|
performance is not impacted by the runtime overhead or lack of
|
|
parallelism.
|
|
.LE
|
|
.S P P
|
|
.1C
|
|
.\"==================================================================
|
|
.NS "Experiment space"
|
|
.2C
|
|
\X'pdf: pdfpic scaling-region.svg.tk.pdf -L 7c'
|
|
.NCOL
|
|
.S -1 -3
|
|
.BL 1m
|
|
.LI
|
|
\fBSaturation limit\fP: small tasks cannot be solved without overhead
|
|
from the runtime, no matter the blocksize.
|
|
.LI
|
|
Different limits for OmpSs-2 and OpenMP.
|
|
.LI
|
|
Experiment A will show the scaling of the app while in the saturation
|
|
zone.
|
|
.LI
|
|
Experiment B will show that OpenMP scales bad in the last 2 points.
|
|
.LI
|
|
Experiment C will show that at some point both OpenMP and OmpSs-2 scale
|
|
bad.
|
|
.LE
|
|
.S P P
|
|
.1C
|
|
.\"==================================================================
|
|
.NS "Experiment space: experiment C"
|
|
.2C
|
|
\X'pdf: pdfpic scalability.svg.tk.pdf -L 7c'
|
|
.NCOL
|
|
.BL 1m
|
|
.LI
|
|
The experiment C will show a difference in performance when approached
|
|
to the saturation limit.
|
|
.LI
|
|
We could say that OmpSs-2 introduces less overhead, therefore allows
|
|
better scalability.
|
|
.LE
|
|
.1C
|
|
.\"==================================================================
|
|
.NS "Reproducibility"
|
|
How easy can we get the same results? Three properties R0 < R1 < R2 (no common nomenclature yet!):
|
|
.BL 1m
|
|
.LI
|
|
R0: \fBSame\fP humans on the \fBsame\fP machine obtain the same result
|
|
.LI
|
|
R1: \fBDifferent\fP humans on the \fBsame\fP machine obtain the same result
|
|
.LI
|
|
R2: \fBDifferent\fP humans on a \fBdifferent\fP machine obtain same result
|
|
.LE
|
|
.P
|
|
Garlic provides 2 types of properties: for software and for experimental
|
|
results:
|
|
.BL 1m
|
|
.LI
|
|
Software is R2: you can get the exact same software by any one, in any
|
|
machine
|
|
.LI
|
|
Experimental results are R1: you cannot change the machine MN4 (yet)
|
|
.LE
|
|
.P
|
|
Same experimental result means that the mean of your results is in the confidence
|
|
interval of our results \fBand the relative std is < 1%\fP.
|