848 Commits

Author SHA1 Message Date
3e197da8a3 hpcg: update figures and remove old ones 2021-04-19 16:05:10 +02:00
866d4561d3 hpcg: remove old experiments 2021-04-19 16:01:11 +02:00
9a88319153 hpcg: add granularity experiment 2021-04-19 16:00:55 +02:00
a96839d11a hpcg: merge weak scaling and add size experiment
The scaling.nix file defines both the strong and weak experiments by
using the parameter "enableStrong".
2021-04-19 15:57:31 +02:00
a71ae9c2c6 hpcg: avoid mismatching names for gen units 2021-04-16 16:15:16 +02:00
d490ef2694 hpcg: remove unused extrae.xml file 2021-04-16 16:14:48 +02:00
b4e37a15a9 hpcg: refactor ss and gen using a common file
- The file gen.nix now provides an experiment for each unit, to reduce
  the evaluation time.

- The pipeline is specified in the common.nix file only.

- The input dataset path is no longer symlinked, but is specified in the
  "--load" argument.

- The size is renamed to "sizePerTask" instead of "n".
2021-04-16 11:51:34 +02:00
9bb570af7f tools: add floatTruncate function 2021-04-16 11:49:37 +02:00
Raúl Peñacoba
4d629fe8f7 hpcg: remove old comments 2021-04-16 09:32:28 +02:00
Raúl Peñacoba
f5c8d0cb88 hpcg: choose a smaller strong scaling problem size 2021-04-16 09:32:28 +02:00
Raúl Peñacoba
cb6577b439 hpcg: add strongscaling
HPCG rounds problem size axis when its value is < 16
2021-04-16 09:32:28 +02:00
Raúl Peñacoba
b60a46b683 hpcg: add weakscaling over some nblocks to check which axis is better 2021-04-16 09:32:28 +02:00
Raúl Peñacoba
1a6075a2b1 hpcg: add first granularity/scalability exps for tampi+isend+oss+task
- oss.nix runs valid hpcg layouts whereas slices.nix does not
2021-04-16 09:32:28 +02:00
12ff1fd506 garlicd: send logs to the builder 2021-04-16 09:29:33 +02:00
732b0c0e9c garlic tool: improve unit status information 2021-04-16 09:29:33 +02:00
64f077c4f6 stages: prepend the stage name to messages 2021-04-16 09:29:33 +02:00
7c94997023 control: add trap for bad exit 2021-04-16 09:29:33 +02:00
fb0dee4b61 exp: move exit1 experiment to slurm 2021-04-16 09:29:33 +02:00
bde54c69c5 sbatch: store queued status 2021-04-16 09:29:33 +02:00
2151e20bd6 exp: add exit1 experiment
Tests unit bad exits
2021-04-16 09:29:33 +02:00
886d16bcc6 garlic tool: add jq as dependency
So we can parse the experiment configuration in JSON
2021-04-16 09:29:33 +02:00
5c0f179830 stdexp: rename "name" to "clusterName" 2021-04-16 09:29:33 +02:00
422d359b48 script: stop on error by default 2021-04-16 09:29:33 +02:00
60248ab06b article: remove not used figures 2021-04-16 09:29:33 +02:00
1cb63b464d osu: adjust figures for publication 2021-04-16 09:29:33 +02:00
821b4f0d15 rplot: patch scales and fontconfig 2021-04-16 09:29:33 +02:00
0cf35decc5 osu: add mtu and eager experiments 2021-04-16 09:29:33 +02:00
26e3a86c78 garlic tool: check the presence of all the units
This check prevents a user from removing units between the
execution of the experiment and the fetch.
2021-04-16 09:29:33 +02:00
b96c39e0ba noise: add srun signal bug to the list 2021-04-16 09:29:33 +02:00
f842f1e01d slurm: add sigsegv experiment
Ensure that we can catch a sigsegv signal before and after the
MPI_Finalize call.
2021-04-16 09:29:33 +02:00
71c06d02da stages: add baywatch stage to check the exit code
This workaround stage prevents srun from returning 0 to the upper stages
when a signal happens after MPI_Finalize. It writes the return code to a
file named .srun.rc.$rank and later checks that exists and contains a 0.

When the program is killed, exits with non-zero and the error is
propagated to the baywatch stage, which aborts immediately without
creating the rc file.
2021-04-16 09:29:26 +02:00
604cfd90a3 test: add sigsegv after MPI_Finalize test
The current srun version used in MN4 returns 0 if the program crashes
after MPI_Finalize, as shown by this test.
2021-04-16 09:28:02 +02:00
07253c3fa0 fwi: update figure index 2021-04-14 17:18:46 +02:00
eab323a13a fwi: update io figure 2021-04-14 17:18:24 +02:00
8ce2a68cd7 fwi: update strong scaling figure script 2021-04-14 17:16:12 +02:00
99c6196734 fwi: update granularity figure 2021-04-14 17:05:09 +02:00
dd75a840ce fwi: use enableIO instead of ioFreq 2021-04-12 20:09:17 +02:00
e49e3b087f fwi: rename big io experiment 2021-04-12 19:49:31 +02:00
59040d9355 fwi: fix inverted resources 2021-04-12 19:31:35 +02:00
6422741cb7 fwi: merge io experiments into one file
The enableExtended parameter control if the experiment runs with
multiple nodes or only one.
2021-04-12 19:27:45 +02:00
99beac9b23 fwi: generate the model in every node
As we are using local storage, we need a copy of the input in every
node. The current method is to run the generator only in the rank which
has assigned the cpu 0 in the mask.
2021-04-12 19:01:10 +02:00
58dc277d3d fwi: refactor ss-io with common.nix
Also, keep the names short and consistent.
2021-04-12 17:57:46 +02:00
47b326c646 fwi: generate the input at runtime 2021-04-12 17:46:07 +02:00
419e7f95cc fwi: avoid input generation
The ModelGenerator is now included in the fwi-params, so that the input
can be generated at runtime.
2021-04-12 17:43:30 +02:00
b0af9b8608 srun: add postSrun hook 2021-04-12 17:41:59 +02:00
4afda7dbfb fwi: use common.nix in sync_io experiment 2021-04-12 16:27:18 +02:00
02a103565c fwi: use common.nix in reuse experiment 2021-04-12 15:48:59 +02:00
788dd13ebd fwi: merge mpi pure experiment
The getResources function is used to assign the proper cpu binding
depending on the version. However, additional contraints are required to
ensure that we have enough points in Y.

By default the mpi+send+seq branch is disabled.
2021-04-12 15:37:39 +02:00
41665bc6fc fwi: refactor config generation into common.nix 2021-04-12 15:01:25 +02:00
9aa07993b2 fwi: refactor ss and granularity experiments
A common.nix file contains the shared stages
2021-04-12 14:41:26 +02:00