c307fc9bb3
Monitor anella instead of gw.bsc.es
...
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-26 12:36:06 +02:00
6f5f234480
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-24 11:49:42 +02:00
1e9bc4086f
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 16:04:15 +02:00
734f52e87f
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 15:34:06 +02:00
18908c3019
Don't log SLURM connection attempts from ssfhead
2023-10-04 08:19:09 +02:00
72658ee5e6
Add docker runner too
2023-10-04 07:55:26 +02:00
cfa3e08e4b
Monitor gitlab.bsc.es too
2023-10-03 09:45:13 +02:00
10101c631d
Monitor PM webpage via blackbox
2023-10-03 08:58:07 +02:00
4d865d7a7e
Temporarily disable pm runners
2023-09-28 14:14:41 +02:00
d9511dab22
Add runner for gitlab.bsc.es
2023-09-28 14:11:30 +02:00
c3ecba513d
Allow anonymous access to grafana
2023-09-22 10:50:14 +02:00
24c05e5ebf
Remove user/group when using DynamicUsers
2023-09-22 10:13:06 +02:00
7aef154dd4
Set the SLURM_CONF variable
2023-09-21 22:18:30 +02:00
4ca4e0fae9
Enable slurm-exporter service
2023-09-21 21:38:34 +02:00
7b686d0ea4
Add prometheus-slurm-exporter package
2023-09-21 21:34:18 +02:00
d4c803dbfb
Mount the hut nix store for SLURM jobs
2023-09-20 18:26:48 +02:00
94ead9b759
Enable direnv integration
2023-09-17 22:27:51 +02:00
e0b3dd961c
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 11:58:47 +02:00
656de00d65
Add bscpkgs and nixpkgs top level attributes
...
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 11:58:10 +02:00
fefdbe9c55
Use hut packages as the default package set
...
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-14 18:28:09 +02:00
c73a337471
Don't fetch registry flakes from the net
2023-09-15 09:13:24 +02:00
dbd57ed57f
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80 ' (2023-09-14)
2023-09-14 18:09:05 +02:00
010491618e
Revert "Update slurm to 23.02.05.1"
...
This reverts commit aaefddc44a9073166ac52b8bd56ac96258d3b053.
2023-09-14 15:46:18 +02:00
722c0b0eaa
Open ports in firewall of compute nodes
2023-09-14 15:45:43 +02:00
772e0f00fb
Update slurm to 23.02.05.1
2023-09-13 17:44:24 +02:00
de3a28b7df
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
a05d87d4b9
Enable fstrim service
2023-09-12 16:39:45 +02:00
826d6263fd
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
b0b04e8fb1
Add encrypted munge key with agenix
2023-09-08 19:01:57 +02:00
a5e81fea95
Remove unused large port hole in firewall
2023-09-08 18:22:48 +02:00
dd616a7fb1
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
e41404f619
Allow only some ports for srun
2023-09-08 17:51:37 +02:00
1c7ce3fc51
Block ssfhead from reaching our slurm daemon
2023-09-08 17:20:32 +02:00
bdd03dac60
Poweroff idle slurm nodes after 1 hour
2023-09-08 13:31:23 +02:00
21b38de26d
Add IB and IPMI node host names
2023-09-08 13:21:37 +02:00
52d3794b14
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
2023-09-07 11:13:45 +02:00
d91c9b7473
Unlock ovni gitlab runners
2023-09-05 16:24:27 +02:00
6b526f9827
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27 ' (2023-08-31)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
→ 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2023-09-05 15:03:26 +02:00
ae4ad95902
Add agenix to all nodes
2023-09-04 22:09:40 +02:00
3cc7b33c5a
Add agenix module to ceph
2023-09-04 22:06:20 +02:00
8fc87885da
Remove old secrets
2023-09-04 22:04:32 +02:00
1ea8912d6c
Mount /ceph in owl1 and owl2
2023-09-04 22:00:36 +02:00
7d9e7e4e83
Warn about the owl2 omnipath device
2023-09-04 22:00:17 +02:00
779b591d40
Clean owl2 configuration
2023-09-04 21:59:56 +02:00
c13022596a
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
875622ad0f
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
a7eddecf80
Add anavarro user
2023-09-04 16:00:01 +02:00
fcddbdb72b
Set zsh inc_append_history option
2023-09-03 16:57:53 +02:00
bfb5363d94
Set zsh shell for rarias
2023-09-03 16:46:27 +02:00
44c1d958f4
Enable zsh and fix key bindings
2023-09-03 11:51:53 +02:00