forked from rarias/jungle
		
	examples: Add granularity examples
This commit is contained in:
		
							parent
							
								
									f0ae0df341
								
							
						
					
					
						commit
						c41456412c
					
				
							
								
								
									
										188
									
								
								garlic/exp/examples/granularity.nix
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										188
									
								
								garlic/exp/examples/granularity.nix
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,188 @@ | |||||||
|  | # This file defines an experiment. It is designed as a function that takes | ||||||
|  | # several parameters and returns a derivation. This derivation, when built will | ||||||
|  | # create several scripts that can be executed and launch the experiment. | ||||||
|  | 
 | ||||||
|  | # These are the inputs to this function: an attribute set which must contain the | ||||||
|  | # following keys: | ||||||
|  | { | ||||||
|  |   stdenv | ||||||
|  | , stdexp | ||||||
|  | , bsc | ||||||
|  | , targetMachine | ||||||
|  | , stages | ||||||
|  | , garlicTools | ||||||
|  | }: | ||||||
|  | 
 | ||||||
|  | # We import in the scope the content of the `stdenv.lib` attribute, which | ||||||
|  | # contain useful functions like `toString`, which will be used later. This is | ||||||
|  | # handy to avoid writting `stdenv.lib.tostring`. | ||||||
|  | 
 | ||||||
|  | with stdenv.lib; | ||||||
|  | 
 | ||||||
|  | # We also have some functions specific to the garlic benchmark which we import | ||||||
|  | # as well. Take a look at the garlic/tools.nix file for more details. | ||||||
|  | with garlicTools; | ||||||
|  | 
 | ||||||
|  | # The `let` keyword allows us to define some local variables which will be used | ||||||
|  | # later. It works as the local variable concept in the C language. | ||||||
|  | let | ||||||
|  | 
 | ||||||
|  |   # Initial variable configuration: every attribute in this set contains lists | ||||||
|  |   # of options which will be used to compute the configuration of the units. The | ||||||
|  |   # cartesian product of all the values will be computed. | ||||||
|  |   varConf = { | ||||||
|  |     # In this case we will vary the columns and rows of the blocksize. This | ||||||
|  |     # configuration will create 3 x 2 = 6 units. | ||||||
|  |     cbs = [ 256 1024 4096 ]; | ||||||
|  |     rbs = [ 512 1024 ]; | ||||||
|  |   }; | ||||||
|  | 
 | ||||||
|  |   # Generate the complete configuration for each unit: genConf is a function | ||||||
|  |   # that accepts the argument `c` and returns a attribute set. The attribute set | ||||||
|  |   # is formed by joining the configuration of the machine (which includes | ||||||
|  |   # details like the number of nodes or the architecture) and the configuration | ||||||
|  |   # that we define for our units. | ||||||
|  |   # | ||||||
|  |   # Notice the use of the `rec` keyword, which allows us to access the elements | ||||||
|  |   # of the set while is being defined. | ||||||
|  |   genConf = c: targetMachine.config // rec { | ||||||
|  | 
 | ||||||
|  |     # These attributes are user defined, and thus the user will need to handle | ||||||
|  |     # them manually. They are not read by the standard pipeline: | ||||||
|  | 
 | ||||||
|  |     # Here we load the `hw` attribute from the machine configuration, so we can | ||||||
|  |     # access it, for example, the number of CPUs per socket as hw.cpusPerSocket. | ||||||
|  |     hw = targetMachine.config.hw; | ||||||
|  | 
 | ||||||
|  |     # These options will be used by the heat app, be we write them here so they | ||||||
|  |     # are stored in the unit configuration. | ||||||
|  |     timesteps = 10; | ||||||
|  |     cols = 1024 * 16; # Columns | ||||||
|  |     rows = 1024 * 16; # Rows | ||||||
|  | 
 | ||||||
|  |     # The blocksize is set to the values passed in the `c` parameter, which will | ||||||
|  |     # be set to one of all the configurations of the cartesian product. for | ||||||
|  |     # example: cbs = 256 and rbs = 512. | ||||||
|  |     # We can also write `inherit (c) cbs rbs`, as is a shorthand notation. | ||||||
|  |     cbs = c.cbs; | ||||||
|  |     rbs = c.rbs; | ||||||
|  | 
 | ||||||
|  |     # The git branch is specified here as well, as will be used when we specify | ||||||
|  |     # the heat app | ||||||
|  |     gitBranch = "garlic/tampi+isend+oss+task"; | ||||||
|  | 
 | ||||||
|  |     # ------------------------------------------------------------------------- | ||||||
|  | 
 | ||||||
|  |     # These attributes are part of the standard pipeline, and are required for | ||||||
|  |     # each experiment. They are automatically recognized by the standard | ||||||
|  |     # execution pipeline. | ||||||
|  | 
 | ||||||
|  |     # The experiment name: | ||||||
|  |     expName = "example-granularity-heat"; | ||||||
|  | 
 | ||||||
|  |     # The experimental unit name. It will be used to create a symlink in the | ||||||
|  |     # index (at /gpfs/projects/bsc15/garlic/$USER/index/) so you can easily find | ||||||
|  |     # the unit. Notice that the symlink is overwritten each time you run a unit | ||||||
|  |     # with the same same. | ||||||
|  |     # | ||||||
|  |     # We use the toString function to convert the numeric value of cbs and rbs | ||||||
|  |     # to a string like: "example-granularity-heat.cbs-256.rbs-512" | ||||||
|  |     unitName = expName + | ||||||
|  |       ".cbs-${toString cbs}" + | ||||||
|  |       ".rbs-${toString rbs}"; | ||||||
|  |      | ||||||
|  |     # Repeat the execution of each unit a few times: this option is | ||||||
|  |     # automatically taken by the experiment, which will repeat the execution of | ||||||
|  |     # the program that many times. It is recommended to run the app at least 30 | ||||||
|  |     # times, but we only used 10 here for demostration purposes (as it will be | ||||||
|  |     # faster to run) | ||||||
|  |     loops = 10; | ||||||
|  | 
 | ||||||
|  |     # Resources: here we configure the resources in the machine. The queue to be | ||||||
|  |     # used is `debug` as is the fastest for small jobs. | ||||||
|  |     qos = "debug"; | ||||||
|  | 
 | ||||||
|  |     # Then the number of MPI processes or tasks per node: | ||||||
|  |     ntasksPerNode = 1; | ||||||
|  | 
 | ||||||
|  |     # And the number of nodes: | ||||||
|  |     nodes = 1; | ||||||
|  | 
 | ||||||
|  |     # We use all the CPUs available in one socket to each MPI process or task. | ||||||
|  |     # Notice that the number of CPUs per socket is not specified directly. but | ||||||
|  |     # loaded from the configuration of the machine that will be used to run our | ||||||
|  |     # experiment. The affinity mask is set accordingly. | ||||||
|  |     cpusPerTask = hw.cpusPerSocket; | ||||||
|  | 
 | ||||||
|  |     # The time will limit the execution of the program in case of a deadlock | ||||||
|  |     time = "02:00:00"; | ||||||
|  | 
 | ||||||
|  |     # The job name will appear in the `squeue` and helps to identify what is | ||||||
|  |     # running. Currently is set to the name of the unit. | ||||||
|  |     jobName = unitName; | ||||||
|  |   }; | ||||||
|  | 
 | ||||||
|  |   # Using the `varConf` and our function `genConf` we compute a list of the | ||||||
|  |   # complete configuration of every unit. | ||||||
|  |   configs = stdexp.buildConfigs { | ||||||
|  |     inherit varConf genConf; | ||||||
|  |   }; | ||||||
|  | 
 | ||||||
|  |   # Now that we have the list of configs, we need to write how that information | ||||||
|  |   # is used to run our program. In our case we will use some params such as the | ||||||
|  |   # number of rows and columns of the input problem or the blocksize as argv | ||||||
|  |   # values. | ||||||
|  | 
 | ||||||
|  |   # The exec stage is used to run a program with some arguments. | ||||||
|  |   exec = {nextStage, conf, ...}: stages.exec { | ||||||
|  |     # All stages require the nextStage attribute, which is passed as parameter. | ||||||
|  |     inherit nextStage; | ||||||
|  | 
 | ||||||
|  |     # Then, we fill the argv array with the elements that will be used when | ||||||
|  |     # running our program. Notice that we load the attributes from the | ||||||
|  |     # configuration which is passed as argument as well. | ||||||
|  |     argv = [ | ||||||
|  |       "--rows" conf.rows | ||||||
|  |       "--cols" conf.cols | ||||||
|  |       "--rbs" conf.rbs | ||||||
|  |       "--cbs" conf.cbs | ||||||
|  |       "--timesteps" conf.timesteps | ||||||
|  |     ]; | ||||||
|  | 
 | ||||||
|  |     # This program requires a file called `head.conf` in the current directory. | ||||||
|  |     # To do it, we run this small script in the `pre` hook, which simple runs | ||||||
|  |     # some commands before running the program. Notice that this command is | ||||||
|  |     # executed in every MPI task. | ||||||
|  |     pre = '' | ||||||
|  |       ln -sf ${nextStage}/etc/heat.conf heat.conf || true | ||||||
|  |     ''; | ||||||
|  |   }; | ||||||
|  | 
 | ||||||
|  |   # The program stage is only used to specify which program we should run. | ||||||
|  |   # We use this stage to specify build-time parameters such as the gitBranch, | ||||||
|  |   # which will be used to fetch the source code. We use the `override` function | ||||||
|  |   # of the `bsc.garlic.apps.heat` derivation to change the input paramenters. | ||||||
|  |   program = {nextStage, conf, ...}: bsc.garlic.apps.heat.override { | ||||||
|  |     inherit (conf) gitBranch; | ||||||
|  |   }; | ||||||
|  | 
 | ||||||
|  |   # Other stages may be defined here, in case that we want to do something | ||||||
|  |   # additional, like running the program under `perf stats` or set some | ||||||
|  |   # envionment variables. | ||||||
|  | 
 | ||||||
|  |   # Once all the stages are defined, we build the pipeline array. The | ||||||
|  |   # `stdexp.stdPipeline` contains the standard pipeline stages, so we don't need | ||||||
|  |   # to specify them. We only specify how we run our program, and what program | ||||||
|  |   # exactly, by adding our `exec` and `program` stages: | ||||||
|  |   pipeline = stdexp.stdPipeline ++ [ exec program ]; | ||||||
|  | 
 | ||||||
|  | # Then, we use the `configs` and the `pipeline` just defined inside the `in` | ||||||
|  | # part, to build the complete experiment: | ||||||
|  | in | ||||||
|  | 
 | ||||||
|  |   # The `stdexp.genExperiment` function generates an experiment by calling every | ||||||
|  |   # stage of the pipeline with the different configs, and thus creating | ||||||
|  |   # different units. The result is the top level derivation which is the | ||||||
|  |   # `trebuchet`, which is the script that, when executed, launches the complete | ||||||
|  |   # experiment. | ||||||
|  |   stdexp.genExperiment { inherit configs pipeline; } | ||||||
| @ -103,4 +103,8 @@ | |||||||
|     impi = callPackage ./osu/impi.nix { }; |     impi = callPackage ./osu/impi.nix { }; | ||||||
|     bwShm = bw.override { interNode = false; }; |     bwShm = bw.override { interNode = false; }; | ||||||
|   }; |   }; | ||||||
|  | 
 | ||||||
|  |   examples = { | ||||||
|  |     granularity = callPackage ./examples/granularity.nix { }; | ||||||
|  |   }; | ||||||
| } | } | ||||||
|  | |||||||
							
								
								
									
										194
									
								
								garlic/fig/examples/granularity.R
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										194
									
								
								garlic/fig/examples/granularity.R
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,194 @@ | |||||||
|  | # This R program takes as argument the dataset that contains the results of the | ||||||
|  | # execution of the heat example experiment and produces some plots. All the | ||||||
|  | # knowledge to understand how this script works is covered by this nice R book: | ||||||
|  | # | ||||||
|  | # Winston Chang, R Graphics Cookbook: Practical Recipes for Visualizing Data, | ||||||
|  | # O’Reilly Media (2020). 2nd edition | ||||||
|  | # | ||||||
|  | # Which can be freely read it online here: https://r-graphics.org/ | ||||||
|  | # | ||||||
|  | # Please, search in this book before copying some random (and probably oudated) | ||||||
|  | # reply on stack overflow. | ||||||
|  | 
 | ||||||
|  | # We load some R packages to import the required functions. We mainly use the | ||||||
|  | # tidyverse packages, which are very good for ploting data, | ||||||
|  | library(ggplot2) | ||||||
|  | library(dplyr, warn.conflicts = FALSE) | ||||||
|  | library(scales) | ||||||
|  | library(jsonlite) | ||||||
|  | library(viridis, warn.conflicts = FALSE) | ||||||
|  | 
 | ||||||
|  | # Here we simply load the arguments to find the input dataset. If nothing is | ||||||
|  | # specified we use the file named `input` in the current directory. | ||||||
|  | # We can run this script directly using: | ||||||
|  | # Rscript <path-to-this-script> <input-dataset> | ||||||
|  | 
 | ||||||
|  | # Load the arguments (argv) | ||||||
|  | args = commandArgs(trailingOnly=TRUE) | ||||||
|  | 
 | ||||||
|  | # Set the input dataset if given in argv[1], or use "input" as default | ||||||
|  | if (length(args)>0) { input_file = args[1] } else { input_file = "input" } | ||||||
|  | 
 | ||||||
|  | # Here we build of dataframe from the input dataset by chaining operations using | ||||||
|  | # the magritte operator `%>%`, which is similar to a UNIX pipe. | ||||||
|  | # First we read the input file, which is expected to be NDJSON | ||||||
|  | df = jsonlite::stream_in(file(input_file), verbose=FALSE) %>% | ||||||
|  | 
 | ||||||
|  |   # Then we flatten it, as it may contain dictionaries inside the columns | ||||||
|  |   jsonlite::flatten() %>% | ||||||
|  |    | ||||||
|  |   # Now the dataframe contains all the configuration of the units inside the | ||||||
|  |   # columns named `config.*`, for example `config.cbs`. We first select only | ||||||
|  |   # the columns that we need: | ||||||
|  |   select(config.cbs, config.rbs, unit, time) %>% | ||||||
|  | 
 | ||||||
|  |   # And then we rename those columns to something shorter: | ||||||
|  |   rename(cbs=config.cbs, rbs=config.rbs) %>% | ||||||
|  | 
 | ||||||
|  |   # The columns contain the values that we specified in the experiment as | ||||||
|  |   # integers. However, we need to tell R that those values are factors. So we | ||||||
|  |   # apply to those columns the `as.factor()` function: | ||||||
|  |   mutate(cbs = as.factor(cbs)) %>% | ||||||
|  |   mutate(rbs = as.factor(rbs)) %>% | ||||||
|  | 
 | ||||||
|  |   # The same for the unit (which is the hash that nix has given to each unit) | ||||||
|  |   mutate(unit = as.factor(unit)) %>% | ||||||
|  | 
 | ||||||
|  |   # Then, we can group our dataset by each unit. This will always work | ||||||
|  |   # independently of the variables that vary from unit to unit. | ||||||
|  |   group_by(unit) %>% | ||||||
|  | 
 | ||||||
|  |   # And compute some metrics which are applied to each group. For example we | ||||||
|  |   # compute the median time within the runs of a unit: | ||||||
|  |   mutate(median.time = median(time)) %>% | ||||||
|  |   mutate(normalized.time = time / median.time - 1) %>% | ||||||
|  |   mutate(log.median.time = log(median.time)) %>% | ||||||
|  | 
 | ||||||
|  |   # Then, we remove the grouping. This step is very important, otherwise the | ||||||
|  |   # plotting functions get confused: | ||||||
|  |   ungroup() | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # These constants will be used when creating the plots. We use high quality | ||||||
|  | # images with 300 dots per inch and 5 x 5 inches of size by default. | ||||||
|  | dpi = 300 | ||||||
|  | h = 5 | ||||||
|  | w = 5 | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # --------------------------------------------------------------------- | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # We plot the median time (of each unit) as we vary the block size. As we vary | ||||||
|  | # both the cbs and rbs, we plot cbs while fixing rbs at a time. | ||||||
|  | p = ggplot(df, aes(x=cbs, y=median.time, color=rbs)) + | ||||||
|  |   # We add a point to the median | ||||||
|  |   geom_point() + | ||||||
|  | 
 | ||||||
|  |   # We also add the lines to connect the points. We need to specify which | ||||||
|  |   # variable will do the grouping, otherwise we will have one line per point. | ||||||
|  |   geom_line(aes(group=rbs)) + | ||||||
|  | 
 | ||||||
|  |   # The bw theme is recommended for publications | ||||||
|  |   theme_bw() + | ||||||
|  | 
 | ||||||
|  |   # Here we add the title and the labels of the axes | ||||||
|  |   labs(x="cbs", y="Median time (s)", title="Heat granularity: median time",  | ||||||
|  |     subtitle=input_file) +  | ||||||
|  | 
 | ||||||
|  |   # And set the subtitle font size a bit smaller, so it fits nicely | ||||||
|  |   theme(plot.subtitle=element_text(size=8)) | ||||||
|  | 
 | ||||||
|  | # Then, we save the plot both in png and pdf | ||||||
|  | ggsave("median.time.png", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | ggsave("median.time.pdf", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # --------------------------------------------------------------------- | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # Another interesting plot is the normalized time, which shows the variance of | ||||||
|  | # the execution times, and can be used to find problems: | ||||||
|  | p = ggplot(df, aes(x=cbs, y=normalized.time)) + | ||||||
|  | 
 | ||||||
|  |   # The boxplots are useful to identify outliers and problems with the | ||||||
|  |   # distribution of time | ||||||
|  |   geom_boxplot() + | ||||||
|  | 
 | ||||||
|  |   # We add a line to mark the 1% limit above and below the median  | ||||||
|  |   geom_hline(yintercept=c(-0.01, 0.01), linetype="dashed", color="red") + | ||||||
|  | 
 | ||||||
|  |   # We split the plot into subplots, one for each value of the rbs column | ||||||
|  |   facet_wrap(~ rbs) + | ||||||
|  | 
 | ||||||
|  |   # The bw theme is recommended for publications | ||||||
|  |   theme_bw() + | ||||||
|  | 
 | ||||||
|  |   # Here we add the title and the labels of the axes | ||||||
|  |   labs(x="cbs", y="Normalized time", title="Heat granularity: normalized time",  | ||||||
|  |     subtitle=input_file) +  | ||||||
|  | 
 | ||||||
|  |   # And set the subtitle font size a bit smaller, so it fits nicely | ||||||
|  |   theme(plot.subtitle=element_text(size=8)) | ||||||
|  | 
 | ||||||
|  | # Then, we save the plot both in png and pdf | ||||||
|  | ggsave("normalized.time.png", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | ggsave("normalized.time.pdf", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # --------------------------------------------------------------------- | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # We plot the time of each run as we vary the block size | ||||||
|  | p = ggplot(df, aes(x=cbs, y=time, color=rbs)) + | ||||||
|  | 
 | ||||||
|  |   # We add a points (scatter plot) using circles (shape=21) a bit larger | ||||||
|  |   # than the default (size=3)  | ||||||
|  |   geom_point(shape=21, size=3) + | ||||||
|  | 
 | ||||||
|  |   # The bw theme is recommended for publications | ||||||
|  |   theme_bw() + | ||||||
|  | 
 | ||||||
|  |   # Here we add the title and the labels of the axes | ||||||
|  |   labs(x="cbs", y="Time (s)", title="Heat granularity: time",  | ||||||
|  |     subtitle=input_file) +  | ||||||
|  | 
 | ||||||
|  |   # And set the subtitle font size a bit smaller, so it fits nicely | ||||||
|  |   theme(plot.subtitle=element_text(size=8)) | ||||||
|  | 
 | ||||||
|  | # Then, we save the plot both in png and pdf | ||||||
|  | ggsave("time.png", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | ggsave("time.pdf", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # --------------------------------------------------------------------- | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # We can also plot both cbs and rbs in each dimension by mapping the time with a | ||||||
|  | # color. The `fill` argument instruct R to use the `median.time` as color | ||||||
|  | p = ggplot(df, aes(x=cbs, y=rbs, fill=median.time)) + | ||||||
|  | 
 | ||||||
|  |   # Then we use the geom_raster method to paint rectangles filled with color | ||||||
|  |   geom_raster() +  | ||||||
|  | 
 | ||||||
|  |   # The colors are set using the viridis package, using the plasma palete. Those | ||||||
|  |   # colors are designed to be safe for color impaired people and also when | ||||||
|  |   # converting the figures to grayscale. | ||||||
|  |   scale_fill_viridis(option="plasma") + | ||||||
|  | 
 | ||||||
|  |   # We also force each tile to be an square | ||||||
|  |   coord_fixed() + | ||||||
|  | 
 | ||||||
|  |   # The bw theme is recommended for publications | ||||||
|  |   theme_bw() + | ||||||
|  | 
 | ||||||
|  |   # Here we add the title and the labels of the axes | ||||||
|  |   labs(x="cbs", y="rbs", title="Heat granularity: time",  | ||||||
|  |     subtitle=input_file) +  | ||||||
|  | 
 | ||||||
|  |   # And set the subtitle font size a bit smaller, so it fits nicely | ||||||
|  |   theme(plot.subtitle=element_text(size=8)) | ||||||
|  | 
 | ||||||
|  | # Then, we save the plot both in png and pdf | ||||||
|  | ggsave("time.heatmap.png", plot=p, width=w, height=h, dpi=dpi) | ||||||
|  | ggsave("time.heatmap.pdf", plot=p, width=w, height=h, dpi=dpi) | ||||||
| @ -74,4 +74,8 @@ in | |||||||
|     "osu/bwShm"       = osu.bwShm; |     "osu/bwShm"       = osu.bwShm; | ||||||
|     "heat/cache"      = heat.cache; |     "heat/cache"      = heat.cache; | ||||||
|   }; |   }; | ||||||
|  | 
 | ||||||
|  |   examples = with exp.examples; { | ||||||
|  |     granularity = stdPlot ./examples/granularity.R [ granularity ]; | ||||||
|  |   }; | ||||||
| } | } | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user