9eebe67402
Limit slurm partition users with AllowGroups
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 33s
Fixes #245
2026-03-13 11:57:05 +01:00
f71e807d47
Add remote sblame probe to prometheus
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-11 16:48:15 +01:00
461d96dc75
Allow access to postgresql socket from CI runner
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 8s
Fixes: #237
Cc: Antoni Navarro <antoni.navarro@bsc.es >
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-11 12:41:06 +01:00
26d9e3d432
Grant gitlab-runner user access to perftestsdb
...
Cc: Antoni Navarro <antoni.navarro@bsc.es >
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-11 12:40:21 +01:00
5c30975b8b
Mount NFS home in tent at /nfs/home
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-09 15:27:41 +01:00
d4c00679ee
Increase NFS subnet to allow tent
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-09 15:27:41 +01:00
32a576e870
Copy Gitea backup in /ceph too
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-09 15:27:37 +01:00
8197221146
Mount /ceph in tent
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-09 08:52:11 +01:00
374cd4ce48
Allow tent to reach ceph
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-09 08:52:08 +01:00
46b7efb5ac
Rename Gitea backup service and directory
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-09 08:51:48 +01:00
56ab099017
Override files in rotating gitea dump service
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-03-09 08:51:44 +01:00
2654b9fdd9
Enable rotating gitea backups
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-03-09 08:51:23 +01:00
84a5cb09ee
Use host mode for docker network
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
In order to reduce the traffic of the secondary Ethernet device we need
to be able to directly use the physical device instead of the virtual
one. For now use the host mode and see later if we can revert it.
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-03-05 15:29:23 +01:00
4899d20748
Fix weasel infiniband interface name
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 17s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-02-26 10:26:01 +01:00
76cd6d64b2
Add ssanzmar user to apex and fox
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-02-24 14:06:12 +01:00
8dab0d82ba
Update fox documentation in website
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 8s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-02-04 15:08:13 +01:00
958dcd4774
Add emonteir user to apex and fox
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-02-04 15:08:08 +01:00
7a6e4232de
Add nom and nixfmt-tree to system packages
...
CI / build:all (pull_request) Successful in 55m38s
CI / build:all (push) Successful in 27m13s
CI / build:cross (push) Successful in 55m5s
CI / build:cross (pull_request) Successful in 8s
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-02-03 15:17:30 +01:00
3b56e905e5
Add standalone home-manager to system packages
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-02-03 15:17:29 +01:00
2d41309466
Format and sort default package list
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-02-03 15:17:24 +01:00
deb0cd1488
Allow USB access to TC1 from Gitlab Runner
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-23 17:56:16 +01:00
cd1f502ecc
Allow user USB access to FTDI device in tent
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-23 17:56:11 +01:00
dda6a66782
Fix gitea user to allow sending email
...
CI / build:cross (pull_request) Successful in 8s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 8s
In order to send email, the gitea user needs to be in the mail-robot
group.
Fixes: #220
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-20 12:18:52 +01:00
22420e6ac8
Remove unneeded perf package from eudy
...
It is already included in the base list of packages, which is now only
"perf" and doesn't depend on the kernel version.
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-20 12:18:49 +01:00
a71cd78b4c
Fix infiniband interface names
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-20 12:18:46 +01:00
933c78a80b
Fix moved package linuxPackages.perf is now perf
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-01-20 12:15:10 +01:00
150969be9b
Fix replaced nixseparatedebuginfod
...
nixseparatedebuginfod has been replaced by nixseparatedebuginfod2
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-01-20 12:15:06 +01:00
779449f1db
Fix renamed option watchdog.runtimeTime
...
The option 'systemd.watchdog.runtimeTime' has been renamed to
'systemd.settings.Manager.RuntimeWatchdogSec'.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-01-20 12:14:59 +01:00
859eebda98
Change varcila shell to zsh
...
CI / build:all (push) Successful in 59m37s
CI / build:cross (push) Successful in 1h27m33s
CI / build:cross (pull_request) Successful in 1h29m20s
CI / build:all (pull_request) Successful in 1h29m22s
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2026-01-07 13:22:17 +01:00
c2a201b085
Increase fail2ban ban time on each attempt
...
CI / build:all (push) Has been cancelled
CI / build:cross (push) Has been cancelled
CI / build:all (pull_request) Successful in 1h38m5s
CI / build:cross (pull_request) Successful in 1h38m3s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-07 13:14:34 +01:00
f921f0a4bd
Disable password login via SSH in apex
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-07 13:14:30 +01:00
aa16bfc0bc
Enable fail2ban in apex login node
...
We are seeing a lot of failed attempts from the same IPs:
apex% sudo journalctl -u sshd -b0 | grep 'Failed password' | wc -l
2441
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2026-01-07 13:14:22 +01:00
fc69ef3217
Enable pam_slurm_adopt in all compute nodes
...
CI / build:cross (pull_request) Successful in 5s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 6s
Prevents access to owl1 and owl2 too if the user doesn't have any jobs
running there.
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-31 11:41:50 +01:00
1d025f7a38
Don't suspend owl compute nodes
...
Currently the owl nodes are located on top of the rack and turning them
off causes a high temperature increase at that region, which accumulates
heat from the whole rack. To maximize airflow we will leave them on at
all times. This also makes allocations immediate at the extra cost of
around 200 W.
In the future, if we include more nodes in SLURM we can configure those
to turn off if needed.
Fixes: #156
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-31 11:41:44 +01:00
5ff1b1343b
Add nixgen to all machines
...
CI / build:cross (pull_request) Successful in 5s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 5s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-29 16:28:05 +01:00
019826d09e
Add OmpSs-2 release timers and services
...
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 6s
Send a reminder email to the STAR group to mark the release cycle dates.
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-28 12:38:37 +01:00
a294daf7e3
Use specific mail-robot group to send mail
...
Allows any user to be able to send mail from the robot account as long
as it is added to the mail-robot group.
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-28 12:38:17 +01:00
e3d1785285
Run a shell in the allocated node with salloc
...
By default, salloc will open a new shell in the *current* node instead
of in the allocated node. This often causes users to leave the extra
shell running once the allocation ends. Repeating this process several
times causes chains of shells.
By running the shell in the remote node, once the allocation ends the
shell finishes as well.
Fixes: #174
See: https://slurm.schedmd.com/faq.html#prompt
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-28 11:44:14 +01:00
14f2393d30
Update website
...
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 6s
Add apex page and replace bscpkgs references for jungle after the merge.
See: rarias/jungle-website#1
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-22 15:48:13 +02:00
f115d611e7
Add aaguirre user
...
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 6s
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-22 15:28:29 +02:00
4261d327c6
Include agenix module and package directly
...
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 6s
Avoids adding an extra flake input only to fetch a single module and
package.
Reviewed-by: Aleix Boné <abonerib@bsc.es >
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2025-10-14 09:37:47 +02:00
98d17b19d3
Enable custom sys-devices system feature
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2025-10-09 11:40:44 +02:00
188ba6df0a
Remove bscpkgs input
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-07 16:07:26 +02:00
e42058f08b
Allow access to hut from fox
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2025-10-02 17:03:21 +02:00
f3bfe89f27
Fetch website from its own git repository
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-02 15:45:21 +02:00
b040bebd1d
Add acinca user
...
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-10-01 12:27:43 +02:00
f69629d2da
Restart slurmd on failure
...
A failure to reach the control node can cause slurmd to fail and the
unit remains in the failed state until is manually restarted. Instead,
try to restart the service every 30 seconds, forever:
owl1% systemctl show slurmd | grep -E 'Restart=|RestartUSec='
Restart=on-failure
RestartUSec=30s
owl1% pgrep slurmd
5903
owl1% sudo kill -SEGV 5903
owl1% pgrep slurmd
6137
Fixes: #177
Reviewed-by: Aleix Boné <abonerib@bsc.es >
2025-09-30 17:20:39 +02:00
0668f0db74
Lower connect timeout when using hut substituter
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2025-09-29 18:44:48 +02:00
5fcd57a061
Use hut substituter in all nodes
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2025-09-29 18:44:38 +02:00
ad1544759f
Remove machine access for user csiringo
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es >
2025-09-29 18:23:24 +02:00