9871517be2
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
736eacaac5
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
0e66aad099
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
67a4905a0a
Don't log SLURM connection attempts from ssfhead
2023-10-06 15:22:04 +02:00
d52d22e0db
Add docker runner too
2023-10-06 15:17:07 +02:00
42920c2521
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
4acd35e036
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
621d20db3a
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
0926f6ec1f
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
61646cb3bd
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
c0066c4744
Remove user/group when using DynamicUsers
2023-09-22 10:13:06 +02:00
ffd0593f51
Set the SLURM_CONF variable
2023-09-21 22:22:00 +02:00
f49ae0773e
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
8de3d2b149
Mount the hut nix store for SLURM jobs
2023-09-20 19:38:43 +02:00
bc62e28ca3
Enable direnv integration
2023-09-20 09:32:58 +02:00
653d411b9e
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
a1e8cfea47
Don't fetch registry flakes from the net
2023-09-15 12:00:28 +02:00
e88805947e
Open ports in firewall of compute nodes
2023-09-14 15:45:43 +02:00
d9d249411d
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
10ca572aec
Enable fstrim service
2023-09-12 16:39:45 +02:00
75b0f48715
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
19a451db77
Add encrypted munge key with agenix
2023-09-08 19:05:45 +02:00
ec9be9bb62
Remove unused large port hole in firewall
2023-09-08 18:22:48 +02:00
7ddd1977f3
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
7050c505b5
Allow only some ports for srun
2023-09-08 17:51:37 +02:00
033a1fe97b
Block ssfhead from reaching our slurm daemon
2023-09-08 17:36:28 +02:00
77cb3c494e
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
6db5772ac4
Add IB and IPMI node host names
2023-09-08 13:21:37 +02:00
dca274d020
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
02f40a8217
Add agenix to all nodes
2023-09-04 22:10:43 +02:00
77d43b6da9
Add agenix module to ceph
2023-09-04 22:07:07 +02:00
ab55aac5ff
Remove old secrets
2023-09-04 22:04:32 +02:00
9b5bfbb7a3
Mount /ceph in owl1 and owl2
2023-09-04 22:00:36 +02:00
a69a71d1b0
Warn about the owl2 omnipath device
2023-09-04 22:00:17 +02:00
98374bd303
Clean owl2 configuration
2023-09-04 21:59:56 +02:00
3b6be8a2fc
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
2bb366b9ac
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
2d16709648
Add anavarro user
2023-09-04 16:00:01 +02:00
9344daa31c
Set zsh inc_append_history option
2023-09-03 16:57:53 +02:00
80c98041b5
Set zsh shell for rarias
2023-09-03 16:46:27 +02:00
3418e57907
Enable zsh and fix key bindings
2023-09-03 16:42:04 +02:00
6848b58e39
Keep a log over time with the config commits
2023-09-03 00:02:14 +02:00
f9c77b433a
Store nixos config in /etc/nixos/config.rev
2023-09-02 23:37:11 +02:00
9d487845f6
Enable binary emulation for other architectures
2023-08-31 17:27:08 +02:00
3c99c2a662
Enable watchdog
2023-08-30 16:32:17 +02:00
7d09108c9f
Enable all osd on boot in lake2
2023-08-30 16:32:17 +02:00
0f0a861896
Scrape lake2 too
2023-08-29 12:33:26 +02:00
beb0d5940e
Also enable monitoring in lake2
2023-08-29 12:29:41 +02:00
70321ce237
Scrape metrics from bay
2023-08-29 11:58:00 +02:00
5bd1d67333
Add monitoring in the bay node
2023-08-29 11:53:32 +02:00