9871517be2
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
736eacaac5
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
0e66aad099
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
d52d22e0db
Add docker runner too
2023-10-06 15:17:07 +02:00
42920c2521
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
4acd35e036
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
621d20db3a
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
0926f6ec1f
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
61646cb3bd
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
f49ae0773e
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
d9d249411d
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
75b0f48715
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
7ddd1977f3
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
77cb3c494e
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
dca274d020
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
02f40a8217
Add agenix to all nodes
2023-09-04 22:10:43 +02:00
ab55aac5ff
Remove old secrets
2023-09-04 22:04:32 +02:00
3b6be8a2fc
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
2bb366b9ac
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
9d487845f6
Enable binary emulation for other architectures
2023-08-31 17:27:08 +02:00
0f0a861896
Scrape lake2 too
2023-08-29 12:33:26 +02:00
70321ce237
Scrape metrics from bay
2023-08-29 11:58:00 +02:00
fad9df61e1
Add fio tool
2023-08-29 11:27:50 +02:00
d2a80c8c18
Add ceph tools in hut too
2023-08-28 17:58:21 +02:00
3416416864
Disable pixiecore in hut for now
2023-08-25 13:21:00 +02:00
815888fb07
Add PXE helper
2023-08-25 12:05:33 +02:00
077eece6b9
Add agenix to PATH in hut
2023-08-23 17:42:50 +02:00
b3ef53de51
Store ceph secret key in age
...
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:26:44 +02:00
e0852ee89b
Add rarias key for secrets
2023-08-23 17:15:26 +02:00
dfffc0bdce
Add ceph metrics to prometheus
2023-08-22 16:33:55 +02:00
8257c245b1
Mount the ceph filesystem in hut
2023-08-22 16:15:46 +02:00
f8fb5fa4ff
Monitor power from other nodes via LAN
2023-08-22 11:28:54 +02:00
acf9b71f04
Increase prometheus retention time to one year
2023-08-22 11:28:54 +02:00
55d6c17776
Allow access to devices for node_exporter
2023-07-28 13:55:35 +02:00
2e95281af5
Add owl and all partition
2023-06-16 11:34:00 +02:00
f4ac9f3186
Simplify flake and expose host pkgs
...
The configuration of the machines is now moved to m/
2023-06-16 11:31:31 +02:00