15 Commits

Author SHA1 Message Date
c307fc9bb3 Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-26 12:36:06 +02:00
6f5f234480 Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-24 11:49:42 +02:00
1e9bc4086f Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 16:04:15 +02:00
cfa3e08e4b Monitor gitlab.bsc.es too 2023-10-03 09:45:13 +02:00
10101c631d Monitor PM webpage via blackbox 2023-10-03 08:58:07 +02:00
c3ecba513d Allow anonymous access to grafana 2023-09-22 10:50:14 +02:00
4ca4e0fae9 Enable slurm-exporter service 2023-09-21 21:38:34 +02:00
dd616a7fb1 Make exporters listen in localhost only 2023-09-08 18:13:04 +02:00
4495cbf380 Scrape lake2 too 2023-08-29 12:33:26 +02:00
c47c190c79 Scrape metrics from bay 2023-08-29 11:58:00 +02:00
bb8bf86051 Add ceph metrics to prometheus 2023-08-22 16:33:55 +02:00
199358a5e3 Monitor power from other nodes via LAN 2023-08-17 18:55:40 +02:00
776a582c10 Increase prometheus retention time to one year 2023-07-28 16:19:59 +02:00
b978839406 Allow access to devices for node_exporter 2023-07-28 13:48:30 +02:00
3cb263ea71 Simplify flake and expose host pkgs
The configuration of the machines is now moved to m/
2023-06-14 17:28:00 +02:00