91 Commits

Author SHA1 Message Date
baa8347753 Enable public-inbox at jungle.bsc.es/lists
The public-inbox service fetches emails from the sourcehut mailing lists
and displays them on the web. The idea is to reduce the dependency on
external services and add a secondary storage for the mailing lists in
case sourcehut goes down or changes the current free plans.

The service is available in https://jungle.bsc.es/lists/ and is open to
the public. It currently mirrors the bscpkgs and jungle mailing list.

We also edited the CSS to improve the readability and have larger fonts
by default.

The service for public-inbox produced by NixOS is not well configured to
fetch emails from an IMAP mail server, so we also manually edit the
service file to enable the network.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
777704a9ce Monitor https://pm.bsc.es/gitlab/ too
The GitLab instance is in the /gitlab endpoint and may fail
independently of https://pm.bsc.es/.

Cc: Víctor López <victor.lopez@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
024a31dd1b Enable nixseparatedebuginfod module
The module is only enabled on Hut and Eudy because we noticed activity
on the debuginfod service even if no debug session was active.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 16:40:16 +02:00
afae708a48 Enable runners for pm.bsc.es/gitlab too
The old runners for the PM gitlab were disabled in configuration in the
last outage, but they remained working until we reboot the node. With
this change we enable the runners for both PM and gitlab.bsc.es.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
6ac5225ddb Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
cd6983223e Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
fb8a0cb0a3 Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
a8c0ce5d06 Make blackbox exporter use the proxy
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
ca0937859d Add docker runner too 2025-10-01 16:40:16 +02:00
4d362351cb Monitor gitlab.bsc.es too 2025-10-01 16:40:16 +02:00
e9b4d87d9f Monitor PM webpage via blackbox 2025-10-01 16:40:16 +02:00
457e403258 Temporarily disable pm runners 2025-10-01 16:40:16 +02:00
32b9cc17a9 Add runner for gitlab.bsc.es 2025-10-01 16:40:16 +02:00
fbabc06641 Allow anonymous access to grafana 2025-10-01 16:40:16 +02:00
b84066fde5 Enable slurm-exporter service 2025-10-01 16:40:16 +02:00
44667e8e40 Monitor storage nodes via IPMI too 2025-10-01 16:40:16 +02:00
66b5074ff1 Serve the nix store from hut 2025-10-01 16:40:16 +02:00
09ac1d6c13 Make exporters listen in localhost only 2025-10-01 16:40:16 +02:00
4c88f9a783 Poweroff idle slurm nodes after 1 hour 2025-10-01 16:40:16 +02:00
aa52236a80 Unlock ovni gitlab runners 2025-10-01 16:40:16 +02:00
6850bf3a71 Add agenix to all nodes 2025-10-01 16:40:16 +02:00
da92154d33 Remove old secrets 2025-10-01 16:40:16 +02:00
8cedffe040 Move the ceph client config to an external module 2025-10-01 16:40:16 +02:00
8a027d8b09 Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2025-10-01 16:40:16 +02:00
76e6ae2f00 Enable binary emulation for other architectures 2025-10-01 16:40:16 +02:00
042ca9e882 Scrape lake2 too 2025-10-01 16:40:16 +02:00
005a1be48a Scrape metrics from bay 2025-10-01 16:40:16 +02:00
af29f639e2 Add fio tool 2025-10-01 16:40:16 +02:00
0fe025e8be Add ceph tools in hut too 2025-10-01 16:40:16 +02:00
81baeee5b1 Disable pixiecore in hut for now 2025-10-01 16:40:16 +02:00
686f750c06 Add PXE helper 2025-10-01 16:40:16 +02:00
3c83996e26 Add agenix to PATH in hut 2025-10-01 16:40:16 +02:00
a4fc3d131a Store ceph secret key in age
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2025-10-01 16:40:16 +02:00
660a8ae163 Add rarias key for secrets 2025-10-01 16:40:16 +02:00
91270b26bb Add ceph metrics to prometheus 2025-10-01 16:40:16 +02:00
94ce6fedf9 Mount the ceph filesystem in hut 2025-10-01 16:40:16 +02:00
8fcb5a1079 Monitor power from other nodes via LAN 2025-10-01 16:40:15 +02:00
b80656228d Increase prometheus retention time to one year 2025-10-01 16:40:15 +02:00
ae2007e2fe Allow access to devices for node_exporter 2025-10-01 16:40:15 +02:00
6ec7353a27 Add owl and all partition 2025-10-01 16:40:15 +02:00
d679fd6314 Simplify flake and expose host pkgs
The configuration of the machines is now moved to m/
2025-10-01 16:40:15 +02:00