54c2bd119f
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
e5d85c1b38
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
f1486b84c1
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
425dca3e00
Add docker runner too
2023-10-06 15:17:07 +02:00
e4080cf931
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
fc9285f89d
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
fbe238f5b6
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
9874da566d
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
ebc5c4d84f
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
5f492ee1d7
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
5a5f4672cd
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
b120a7ca85
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
868f825e26
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
9c9c41fb57
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
eb9876aff6
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
68f4d54dd1
Add agenix to all nodes
2023-09-04 22:10:43 +02:00
2c8c90e6e4
Remove old secrets
2023-09-04 22:04:32 +02:00
74ec4eb22a
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
0a5f9b55f5
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
acb91695ac
Enable binary emulation for other architectures
2023-08-31 17:27:08 +02:00
e1d406023d
Scrape lake2 too
2023-08-29 12:33:26 +02:00
1266c8f04e
Scrape metrics from bay
2023-08-29 11:58:00 +02:00
86eacdd3e5
Add fio tool
2023-08-29 11:27:50 +02:00
4fa074f893
Add ceph tools in hut too
2023-08-28 17:58:21 +02:00
f18f1937ae
Disable pixiecore in hut for now
2023-08-25 13:21:00 +02:00
4b78ec9134
Add PXE helper
2023-08-25 12:05:33 +02:00
832866cbfa
Add agenix to PATH in hut
2023-08-23 17:42:50 +02:00
9fc393bb6a
Store ceph secret key in age
...
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:26:44 +02:00
d81d9d58e1
Add rarias key for secrets
2023-08-23 17:15:26 +02:00
d54dcc8d8f
Add ceph metrics to prometheus
2023-08-22 16:33:55 +02:00
a5fae4a289
Mount the ceph filesystem in hut
2023-08-22 16:15:46 +02:00
1622b3e7fc
Monitor power from other nodes via LAN
2023-08-22 11:28:54 +02:00
3424cac761
Increase prometheus retention time to one year
2023-08-22 11:28:54 +02:00
e497e1b88b
Allow access to devices for node_exporter
2023-07-28 13:55:35 +02:00
30c21155af
Add owl and all partition
2023-06-16 11:34:00 +02:00
a43016ebee
Simplify flake and expose host pkgs
...
The configuration of the machines is now moved to m/
2023-06-16 11:31:31 +02:00