6f5f234480
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-24 11:49:42 +02:00
1e9bc4086f
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 16:04:15 +02:00
734f52e87f
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 15:34:06 +02:00
72658ee5e6
Add docker runner too
2023-10-04 07:55:26 +02:00
cfa3e08e4b
Monitor gitlab.bsc.es too
2023-10-03 09:45:13 +02:00
10101c631d
Monitor PM webpage via blackbox
2023-10-03 08:58:07 +02:00
4d865d7a7e
Temporarily disable pm runners
2023-09-28 14:14:41 +02:00
d9511dab22
Add runner for gitlab.bsc.es
2023-09-28 14:11:30 +02:00
c3ecba513d
Allow anonymous access to grafana
2023-09-22 10:50:14 +02:00
4ca4e0fae9
Enable slurm-exporter service
2023-09-21 21:38:34 +02:00
de3a28b7df
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
826d6263fd
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
dd616a7fb1
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
bdd03dac60
Poweroff idle slurm nodes after 1 hour
2023-09-08 13:31:23 +02:00
d91c9b7473
Unlock ovni gitlab runners
2023-09-05 16:24:27 +02:00
ae4ad95902
Add agenix to all nodes
2023-09-04 22:09:40 +02:00
8fc87885da
Remove old secrets
2023-09-04 22:04:32 +02:00
c13022596a
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
875622ad0f
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
48727d3a88
Enable binary emulation for other architectures
2023-08-31 17:22:36 +02:00
4495cbf380
Scrape lake2 too
2023-08-29 12:33:26 +02:00
c47c190c79
Scrape metrics from bay
2023-08-29 11:58:00 +02:00
042e56b5b2
Add fio tool
2023-08-29 11:27:50 +02:00
a510a41eed
Add ceph tools in hut too
2023-08-28 17:58:21 +02:00
300690df4c
Disable pixiecore in hut for now
2023-08-25 13:21:00 +02:00
9d15c13a44
Add PXE helper
2023-08-25 12:03:30 +02:00
591a4c774e
Add agenix to PATH in hut
2023-08-23 17:42:50 +02:00
e8d5eeb5cf
Store ceph secret key in age
...
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:18:17 +02:00
2516559fac
Add rarias key for secrets
2023-08-23 17:15:26 +02:00
bb8bf86051
Add ceph metrics to prometheus
2023-08-22 16:33:55 +02:00
2416ec7806
Mount the ceph filesystem in hut
2023-08-22 15:57:49 +02:00
199358a5e3
Monitor power from other nodes via LAN
2023-08-17 18:55:40 +02:00
776a582c10
Increase prometheus retention time to one year
2023-07-28 16:19:59 +02:00
b978839406
Allow access to devices for node_exporter
2023-07-28 13:48:30 +02:00
e0ab4e1408
Add owl and all partition
2023-06-16 11:34:00 +02:00
3cb263ea71
Simplify flake and expose host pkgs
...
The configuration of the machines is now moved to m/
2023-06-14 17:28:00 +02:00