5003139e8e
Enable nixseparatedebuginfod module
...
The module is only enabled on Hut and Eudy because we noticed activity
on the debuginfod service even if no debug session was active.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 16:40:16 +02:00
845adfc937
Use tmpfs in /tmp
...
The /tmp directory was using the SSD disk which is not erased across
boots. Nix will use /tmp to perform the builds, so we want it to be as
fast as possible. In general, all the machines have enough space to
handle large builds like LLVM.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
52b4cba900
Enable runners for pm.bsc.es/gitlab too
...
The old runners for the PM gitlab were disabled in configuration in the
last outage, but they remained working until we reboot the node. With
this change we enable the runners for both PM and gitlab.bsc.es.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
7b22865a1e
Remove complete ceph package from hut
...
Only the ceph-client is needed.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
6ed2d2e089
Fix warning in slurm exporter using vendorHash
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
d557158c3f
Remove old Ceph package overlay
...
The Ceph package is now integrated in upstream nixpkgs.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
4e0fc52927
flake.lock: Update
...
Flake lock file updates:
• Updated input 'agenix':
'github:ryantm/agenix/d8c973fd228949736dedf61b7f8cc1ece3236792' (2023-07-24)
→ 'github:ryantm/agenix/daf42cb35b2dc614d1551e37f96406e4c4a2d3e4' (2023-10-08)
• Updated input 'bscpkgs':
'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538 ' (2023-10-31)
→ 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=e148de50d68b3eeafc3389b331cf042075971c4b ' (2023-11-22)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
→ 'github:NixOS/nixpkgs/e4ad989506ec7d71f7302cc3067abd82730a4beb' (2023-11-19)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
34628c0e39
BSC packages are no longer in bsc attribute
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
2703cd456d
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80 ' (2023-09-14)
→ 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538 ' (2023-10-31)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
9325c203fb
Switch bscpkgs URL to sourcehut
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
5d1f008199
Monitor anella instead of gw.bsc.es
...
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
f9622b19ef
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
f2d26fd2e2
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
bf8f0ac583
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
4e333dca21
Don't log SLURM connection attempts from ssfhead
2025-10-01 16:40:16 +02:00
33c1da6c40
Add docker runner too
2025-10-01 16:40:16 +02:00
5dc41a86e5
Monitor gitlab.bsc.es too
2025-10-01 16:40:16 +02:00
697c3d884e
Monitor PM webpage via blackbox
2025-10-01 16:40:16 +02:00
5a537c7478
Temporarily disable pm runners
2025-10-01 16:40:16 +02:00
c06b706e49
Add runner for gitlab.bsc.es
2025-10-01 16:40:16 +02:00
270cff123d
Allow anonymous access to grafana
2025-10-01 16:40:16 +02:00
b219badaaf
Remove user/group when using DynamicUsers
2025-10-01 16:40:16 +02:00
f4fcb7c72c
Set the SLURM_CONF variable
2025-10-01 16:40:16 +02:00
b4ede66387
Enable slurm-exporter service
2025-10-01 16:40:16 +02:00
ec351a157c
Add prometheus-slurm-exporter package
2025-10-01 16:40:16 +02:00
63d63fd39a
Mount the hut nix store for SLURM jobs
2025-10-01 16:40:16 +02:00
beae9d240e
Enable direnv integration
2025-10-01 16:40:16 +02:00
e925b00489
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2025-10-01 16:40:16 +02:00
5594e3615d
Add bscpkgs and nixpkgs top level attributes
...
Allows the evaluation of packages of the intermediate overlays.
2025-10-01 16:40:16 +02:00
384c4ee766
Use hut packages as the default package set
...
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2025-10-01 16:40:16 +02:00
87871de141
Don't fetch registry flakes from the net
2025-10-01 16:40:16 +02:00
c5a058f96a
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80 ' (2023-09-14)
2025-10-01 16:40:16 +02:00
79077477e1
Revert "Update slurm to 23.02.05.1"
...
This reverts commit aaefddc44a9073166ac52b8bd56ac96258d3b053.
2025-10-01 16:40:16 +02:00
f5a6055f21
Open ports in firewall of compute nodes
2025-10-01 16:40:16 +02:00
0333b57851
Update slurm to 23.02.05.1
2025-10-01 16:40:16 +02:00
00068cb11c
Monitor storage nodes via IPMI too
2025-10-01 16:40:16 +02:00
a992b266bb
Enable fstrim service
2025-10-01 16:40:16 +02:00
c26cff7bdb
Serve the nix store from hut
2025-10-01 16:40:16 +02:00
1b5469af13
Add encrypted munge key with agenix
2025-10-01 16:40:16 +02:00
78c883a274
Remove unused large port hole in firewall
2025-10-01 16:40:16 +02:00
3385252f5f
Make exporters listen in localhost only
2025-10-01 16:40:16 +02:00
241b888a7c
Allow only some ports for srun
2025-10-01 16:40:16 +02:00
b7aba3d15c
Block ssfhead from reaching our slurm daemon
2025-10-01 16:40:16 +02:00
e35b51cd00
Poweroff idle slurm nodes after 1 hour
2025-10-01 16:40:16 +02:00
2e460f49bd
Add IB and IPMI node host names
2025-10-01 16:40:16 +02:00
a13a2caf57
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
2025-10-01 16:40:16 +02:00
ac3817d99b
Unlock ovni gitlab runners
2025-10-01 16:40:16 +02:00
c1d9b01ed1
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27 ' (2023-08-31)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
→ 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2025-10-01 16:40:16 +02:00
e7aa2d3fe3
Add agenix to all nodes
2025-10-01 16:40:16 +02:00
d2860ce437
Add agenix module to ceph
2025-10-01 16:40:16 +02:00