Rodrigo Arias Mallo dc4e2be9de Restart slurmd on failure
A failure to reach the control node can cause slurmd to fail and the
unit remains in the failed state until is manually restarted. Instead,
try to restart the service every 30 seconds, forever:

    owl1% systemctl show slurmd | grep -E 'Restart=|RestartUSec='
    Restart=on-failure
    RestartUSec=30s
    owl1% pgrep slurmd
    5903
    owl1% sudo kill -SEGV 5903
    owl1% pgrep slurmd
    6137

Fixes: #177
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:18 +02:00
2025-10-01 16:40:18 +02:00
2025-10-01 16:40:18 +02:00
2025-10-01 16:40:16 +02:00
2025-10-01 16:40:17 +02:00
2025-10-01 16:40:17 +02:00
2025-10-01 16:40:18 +02:00
2025-10-01 16:40:15 +02:00
Description
Configuration for NixOS machines.
https://jungle.bsc.es/
88 MiB
Languages
Nix 87.7%
C++ 7.1%
Shell 2.5%
Python 1.4%
CSS 0.6%
Other 0.5%