A failure to reach the control node can cause slurmd to fail and the unit remains in the failed state until is manually restarted. Instead, try to restart the service every 30 seconds, forever: owl1% systemctl show slurmd | grep -E 'Restart=|RestartUSec=' Restart=on-failure RestartUSec=30s owl1% pgrep slurmd 5903 owl1% sudo kill -SEGV 5903 owl1% pgrep slurmd 6137 Fixes: rarias/jungle#177 Reviewed-by: Aleix Boné <abonerib@bsc.es>
Description
Old jungle repository with big blobs
Languages
HTML
48.3%
Nix
36.4%
C++
6.3%
CSS
5.8%
JavaScript
1.5%
Other
1.6%