81 Commits

Author SHA1 Message Date
e9b4d87d9f Monitor PM webpage via blackbox 2025-10-01 16:40:16 +02:00
457e403258 Temporarily disable pm runners 2025-10-01 16:40:16 +02:00
32b9cc17a9 Add runner for gitlab.bsc.es 2025-10-01 16:40:16 +02:00
fbabc06641 Allow anonymous access to grafana 2025-10-01 16:40:16 +02:00
b84066fde5 Enable slurm-exporter service 2025-10-01 16:40:16 +02:00
44667e8e40 Monitor storage nodes via IPMI too 2025-10-01 16:40:16 +02:00
66b5074ff1 Serve the nix store from hut 2025-10-01 16:40:16 +02:00
09ac1d6c13 Make exporters listen in localhost only 2025-10-01 16:40:16 +02:00
4c88f9a783 Poweroff idle slurm nodes after 1 hour 2025-10-01 16:40:16 +02:00
aa52236a80 Unlock ovni gitlab runners 2025-10-01 16:40:16 +02:00
6850bf3a71 Add agenix to all nodes 2025-10-01 16:40:16 +02:00
da92154d33 Remove old secrets 2025-10-01 16:40:16 +02:00
8cedffe040 Move the ceph client config to an external module 2025-10-01 16:40:16 +02:00
8a027d8b09 Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2025-10-01 16:40:16 +02:00
76e6ae2f00 Enable binary emulation for other architectures 2025-10-01 16:40:16 +02:00
042ca9e882 Scrape lake2 too 2025-10-01 16:40:16 +02:00
005a1be48a Scrape metrics from bay 2025-10-01 16:40:16 +02:00
af29f639e2 Add fio tool 2025-10-01 16:40:16 +02:00
0fe025e8be Add ceph tools in hut too 2025-10-01 16:40:16 +02:00
81baeee5b1 Disable pixiecore in hut for now 2025-10-01 16:40:16 +02:00
686f750c06 Add PXE helper 2025-10-01 16:40:16 +02:00
3c83996e26 Add agenix to PATH in hut 2025-10-01 16:40:16 +02:00
a4fc3d131a Store ceph secret key in age
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2025-10-01 16:40:16 +02:00
660a8ae163 Add rarias key for secrets 2025-10-01 16:40:16 +02:00
91270b26bb Add ceph metrics to prometheus 2025-10-01 16:40:16 +02:00
94ce6fedf9 Mount the ceph filesystem in hut 2025-10-01 16:40:16 +02:00
8fcb5a1079 Monitor power from other nodes via LAN 2025-10-01 16:40:15 +02:00
b80656228d Increase prometheus retention time to one year 2025-10-01 16:40:15 +02:00
ae2007e2fe Allow access to devices for node_exporter 2025-10-01 16:40:15 +02:00
6ec7353a27 Add owl and all partition 2025-10-01 16:40:15 +02:00
d679fd6314 Simplify flake and expose host pkgs
The configuration of the machines is now moved to m/
2025-10-01 16:40:15 +02:00