Fix memory limit in fox and remove IPMI watchdog #257
Reference in New Issue
Block a user
Delete Branch "fox-limit"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In order to prevent jobs to use more memory than available, we enable the cgroup limit in SLURM leaving 1% for the system. The limits are enforced in a cgroup, which is only applied when the memory is also a consumable resources. We also need to constraint the swap otherwise the memory usage is moved to swap.
The configuration has been updated in all slurm nodes, so it should be working fine now.
The following program shows how it is killed on large allocations instead of triggering the kernel OOM killer:
We also blacklist the IPMI watchdog so we can run with a buggy BMC, as it is not reliable, and add access to Dylan to owl nodes.
CC @varcila
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.