cd6983223e
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
fb8a0cb0a3
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
a8c0ce5d06
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
9957a0269d
Don't log SLURM connection attempts from ssfhead
2025-10-01 16:40:16 +02:00
ca0937859d
Add docker runner too
2025-10-01 16:40:16 +02:00
4d362351cb
Monitor gitlab.bsc.es too
2025-10-01 16:40:16 +02:00
e9b4d87d9f
Monitor PM webpage via blackbox
2025-10-01 16:40:16 +02:00
457e403258
Temporarily disable pm runners
2025-10-01 16:40:16 +02:00
32b9cc17a9
Add runner for gitlab.bsc.es
2025-10-01 16:40:16 +02:00
fbabc06641
Allow anonymous access to grafana
2025-10-01 16:40:16 +02:00
7b67b2b703
Remove user/group when using DynamicUsers
2025-10-01 16:40:16 +02:00
ce964b9b65
Set the SLURM_CONF variable
2025-10-01 16:40:16 +02:00
b84066fde5
Enable slurm-exporter service
2025-10-01 16:40:16 +02:00
b84d1d5e26
Mount the hut nix store for SLURM jobs
2025-10-01 16:40:16 +02:00
6d8fd353d0
Enable direnv integration
2025-10-01 16:40:16 +02:00
642507b255
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2025-10-01 16:40:16 +02:00
0ca0da9ffe
Don't fetch registry flakes from the net
2025-10-01 16:40:16 +02:00
1b296f2ce7
Open ports in firewall of compute nodes
2025-10-01 16:40:16 +02:00
44667e8e40
Monitor storage nodes via IPMI too
2025-10-01 16:40:16 +02:00
627c912b87
Enable fstrim service
2025-10-01 16:40:16 +02:00
66b5074ff1
Serve the nix store from hut
2025-10-01 16:40:16 +02:00
79446cebcb
Add encrypted munge key with agenix
2025-10-01 16:40:16 +02:00
061fc60939
Remove unused large port hole in firewall
2025-10-01 16:40:16 +02:00
09ac1d6c13
Make exporters listen in localhost only
2025-10-01 16:40:16 +02:00
a6324e47e8
Allow only some ports for srun
2025-10-01 16:40:16 +02:00
2f258e1cdd
Block ssfhead from reaching our slurm daemon
2025-10-01 16:40:16 +02:00
4c88f9a783
Poweroff idle slurm nodes after 1 hour
2025-10-01 16:40:16 +02:00
01140353c6
Add IB and IPMI node host names
2025-10-01 16:40:16 +02:00
aa52236a80
Unlock ovni gitlab runners
2025-10-01 16:40:16 +02:00
6850bf3a71
Add agenix to all nodes
2025-10-01 16:40:16 +02:00
aa92294907
Add agenix module to ceph
2025-10-01 16:40:16 +02:00
da92154d33
Remove old secrets
2025-10-01 16:40:16 +02:00
adec7f80fd
Mount /ceph in owl1 and owl2
2025-10-01 16:40:16 +02:00
8a0034a867
Warn about the owl2 omnipath device
2025-10-01 16:40:16 +02:00
6828273c05
Clean owl2 configuration
2025-10-01 16:40:16 +02:00
8cedffe040
Move the ceph client config to an external module
2025-10-01 16:40:16 +02:00
8a027d8b09
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2025-10-01 16:40:16 +02:00
1f32b8409a
Add anavarro user
2025-10-01 16:40:16 +02:00
bc51564a88
Set zsh inc_append_history option
2025-10-01 16:40:16 +02:00
8ba4f910c3
Set zsh shell for rarias
2025-10-01 16:40:16 +02:00
515fa49ed0
Enable zsh and fix key bindings
2025-10-01 16:40:16 +02:00
c63fa494d5
Keep a log over time with the config commits
2025-10-01 16:40:16 +02:00
a6d3f43b98
Store nixos config in /etc/nixos/config.rev
2025-10-01 16:40:16 +02:00
76e6ae2f00
Enable binary emulation for other architectures
2025-10-01 16:40:16 +02:00
409efacf5b
Enable watchdog
2025-10-01 16:40:16 +02:00
e1e879178d
Enable all osd on boot in lake2
2025-10-01 16:40:16 +02:00
042ca9e882
Scrape lake2 too
2025-10-01 16:40:16 +02:00
9241bda0ac
Also enable monitoring in lake2
2025-10-01 16:40:16 +02:00
005a1be48a
Scrape metrics from bay
2025-10-01 16:40:16 +02:00
f86114f33e
Add monitoring in the bay node
2025-10-01 16:40:16 +02:00