forked from rarias/jungle
f69629d2da0e46dcd64e70fa7454277bf8aea3f8
A failure to reach the control node can cause slurmd to fail and the
unit remains in the failed state until is manually restarted. Instead,
try to restart the service every 30 seconds, forever:
owl1% systemctl show slurmd | grep -E 'Restart=|RestartUSec='
Restart=on-failure
RestartUSec=30s
owl1% pgrep slurmd
5903
owl1% sudo kill -SEGV 5903
owl1% pgrep slurmd
6137
Fixes: rarias/jungle#177
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Description
Configuration for NixOS machines.
Languages
Nix
87.2%
C++
7%
Shell
3.2%
Python
1.3%
CSS
0.6%
Other
0.5%