15 Commits

Author SHA1 Message Date
19e195b894 Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-27 12:46:08 +02:00
54c2bd119f Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
e5d85c1b38 Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
e4080cf931 Monitor gitlab.bsc.es too 2023-10-06 15:17:07 +02:00
fc9285f89d Monitor PM webpage via blackbox 2023-10-06 15:17:07 +02:00
ebc5c4d84f Allow anonymous access to grafana 2023-09-22 10:51:30 +02:00
5f492ee1d7 Enable slurm-exporter service 2023-09-21 21:40:02 +02:00
868f825e26 Make exporters listen in localhost only 2023-09-08 18:13:04 +02:00
e1d406023d Scrape lake2 too 2023-08-29 12:33:26 +02:00
1266c8f04e Scrape metrics from bay 2023-08-29 11:58:00 +02:00
d54dcc8d8f Add ceph metrics to prometheus 2023-08-22 16:33:55 +02:00
1622b3e7fc Monitor power from other nodes via LAN 2023-08-22 11:28:54 +02:00
3424cac761 Increase prometheus retention time to one year 2023-08-22 11:28:54 +02:00
e497e1b88b Allow access to devices for node_exporter 2023-07-28 13:55:35 +02:00
a43016ebee Simplify flake and expose host pkgs
The configuration of the machines is now moved to m/
2023-06-16 11:31:31 +02:00