Commit Graph

66 Commits

Author SHA1 Message Date
64f077c4f6 stages: prepend the stage name to messages 2021-04-16 09:29:33 +02:00
7c94997023 control: add trap for bad exit 2021-04-16 09:29:33 +02:00
bde54c69c5 sbatch: store queued status 2021-04-16 09:29:33 +02:00
422d359b48 script: stop on error by default 2021-04-16 09:29:33 +02:00
71c06d02da stages: add baywatch stage to check the exit code
This workaround stage prevents srun from returning 0 to the upper stages
when a signal happens after MPI_Finalize. It writes the return code to a
file named .srun.rc.$rank and later checks that exists and contains a 0.

When the program is killed, exits with non-zero and the error is
propagated to the baywatch stage, which aborts immediately without
creating the rc file.
2021-04-16 09:29:26 +02:00
b0af9b8608 srun: add postSrun hook 2021-04-12 17:41:59 +02:00
87fa3bb336 sbatch: assert types to avoid silent parse errors 2021-03-19 16:37:31 +01:00
051a74b85d srun: allow commands to run before srun 2021-02-26 17:00:09 +01:00
8a77900201 srun: don't expand variables on install 2021-02-26 16:59:29 +01:00
ebcbf91fbe exec: allow manual specification of program path 2021-02-23 15:22:18 +01:00
e5561b8735 control: save total execution time 2021-02-08 14:14:08 +01:00
2b9c3da911 Add script stage 2021-01-12 18:19:49 +01:00
aeac1a6068 exec: Force newlines
Allow single line commands like pre="true"
2021-01-11 19:15:37 +01:00
130fe39c8e exec: Abort on error
We need exit on the first error, as otherwise we cannot track a bad
execution when no exec is done (when post is not empty).
2021-01-11 18:29:30 +01:00
7d4db6b6de control: Exit on error
This prevents srun from silently returning with an error, without
actually queueing the job of a run.
2020-12-07 16:33:40 +01:00
1bdeca9e7d unit: Remove dangerous slash from index names 2020-12-03 16:33:48 +01:00
c858f521bf isolate: add $TMPDIR in the namespace 2020-12-03 13:22:10 +01:00
da4bbf8533 isolate: only load some files from /etc 2020-12-03 12:04:51 +01:00
f87d830218 isolate: preserve TERM 2020-12-02 13:06:55 +01:00
3d352fee19 isolate: allow argument passing 2020-12-02 13:06:35 +01:00
1f841649f8 exec: add support for nixPrefix 2020-12-02 11:57:40 +01:00
a147a396d9 trebuchet: add the experiment as attribute 2020-11-20 15:35:36 +01:00
8bc5656461 tools: recursive getExperiment
It allows getExperimentStage to be called from any stage above the
experiment.
2020-11-20 15:34:14 +01:00
d192a59fdc control: Export the run iteration 2020-11-20 15:32:41 +01:00
734d494d96 stdexp: Allow extra mounts 2020-11-20 15:30:47 +01:00
David Alvarez
0c438d4dac Setup for test experiment 2020-11-20 13:57:12 +01:00
e8f649327a exec: Avoid variable expansion at build
All bash variables passed in env, pre or post are now expanded at
execution time..
2020-11-20 13:54:45 +01:00
e1e34ddf75 exec: add pre and post code to allow cleanup tasks 2020-11-17 16:09:38 +01:00
641e752bd5 Add a trace message at unit evaluation 2020-11-17 11:12:12 +01:00
317409f6ac Move index and out inside the user directory 2020-11-03 19:10:00 +01:00
5e2797bcde Create index files for the experiments 2020-11-03 19:10:00 +01:00
efd7df068e Print full experiment path 2020-11-03 19:10:00 +01:00
3bd4e61f3f WIP: Testing with automatic fetching 2020-11-03 19:09:59 +01:00
59346fa97e control: Add status file 2020-11-03 19:09:59 +01:00
4beb069627 WIP: postprocessing pipeline
Now each run is executed in a independent folder
2020-11-03 19:09:59 +01:00
2680dcb66f Don't nest the unit results
The experiment directory now contains symlinks to the units, keeping the
old structure. The unit results are directly placed in the garlic out
directory.
2020-11-03 19:09:58 +01:00
c3659d316d Add perf stage 2020-11-03 19:09:58 +01:00
80ccd1240a Less verbose execution 2020-10-14 16:29:22 +02:00
9d8f7d9074 Print the experiment being run 2020-10-14 16:28:27 +02:00
c7d2e2d866 Write the unit config in a file 2020-10-14 16:27:47 +02:00
7a37913b4e Set the ssh host from the machine config 2020-10-13 14:30:03 +02:00
a38ff31cca Introduce the runexp stage 2020-10-13 13:00:59 +02:00
6ab448b10a Fix trebuchet description 2020-10-09 20:28:00 +02:00
4de20d3aa5 Remove old stages and update some 2020-10-09 20:12:52 +02:00
27bc977590 Remove strace from isolate stage 2020-10-09 19:50:28 +02:00
332b738889 Move apps into garlic/apps 2020-10-09 16:42:06 +02:00
a576be8031 WIP stage redesign 2020-10-09 16:42:06 +02:00
654e243735 Include an index in the trebuchet 2020-10-09 16:42:06 +02:00
45afe7d391 Simplify experiment stage 2020-10-09 16:42:06 +02:00
d599b8c52f New naming convention 2020-10-09 16:42:06 +02:00