forked from rarias/bscpkgs
stages: add baywatch stage to check the exit code
This workaround stage prevents srun from returning 0 to the upper stages when a signal happens after MPI_Finalize. It writes the return code to a file named .srun.rc.$rank and later checks that exists and contains a 0. When the program is killed, exits with non-zero and the error is propagated to the baywatch stage, which aborts immediately without creating the rc file.
This commit is contained in:
@@ -99,13 +99,17 @@ rec {
|
||||
inherit nextStage;
|
||||
}
|
||||
);
|
||||
|
||||
baywatch = {nextStage, ...}: stages.baywatch {
|
||||
inherit nextStage;
|
||||
};
|
||||
};
|
||||
|
||||
stdPipelineOverride = {overrides ? {}}:
|
||||
let
|
||||
stages = stdStages // overrides;
|
||||
in
|
||||
with stages; [ sbatch isolate control srun isolate ];
|
||||
with stages; [ sbatch isolate control srun isolate baywatch ];
|
||||
|
||||
|
||||
stdPipeline = stdPipelineOverride {};
|
||||
|
||||
Reference in New Issue
Block a user