84c4b6b81c
Switch bscpkgs URL to sourcehut
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
19e195b894
Monitor anella instead of gw.bsc.es
...
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-27 12:46:08 +02:00
54c2bd119f
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
e5d85c1b38
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
f1486b84c1
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
472f4b0334
Don't log SLURM connection attempts from ssfhead
2023-10-06 15:22:04 +02:00
425dca3e00
Add docker runner too
2023-10-06 15:17:07 +02:00
e4080cf931
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
fc9285f89d
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
fbe238f5b6
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
9874da566d
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
ebc5c4d84f
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
8634a9e133
Remove user/group when using DynamicUsers
2023-09-22 10:13:06 +02:00
0ce79ed79e
Set the SLURM_CONF variable
2023-09-21 22:22:00 +02:00
5f492ee1d7
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
9071a4de8b
Add prometheus-slurm-exporter package
2023-09-21 21:34:18 +02:00
3040a803b2
Mount the hut nix store for SLURM jobs
2023-09-20 19:38:43 +02:00
70a9e855cf
Enable direnv integration
2023-09-20 09:32:58 +02:00
aa64e9ef24
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
ba2b74fd5a
Add bscpkgs and nixpkgs top level attributes
...
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 12:00:33 +02:00
1ae5d9e25e
Use hut packages as the default package set
...
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-15 12:00:28 +02:00
ff98ba47c4
Don't fetch registry flakes from the net
2023-09-15 12:00:28 +02:00
599b23ef52
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80 ' (2023-09-14)
2023-09-15 11:50:47 +02:00
8dbee06d1d
Revert "Update slurm to 23.02.05.1"
...
This reverts commit 7bfd786c01c36131cd00b90fc6a9503fd1226578.
2023-09-14 15:46:18 +02:00
d522113cb9
Open ports in firewall of compute nodes
2023-09-14 15:45:43 +02:00
7bfd786c01
Update slurm to 23.02.05.1
2023-09-13 17:44:24 +02:00
5a5f4672cd
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
2646ad4b70
Enable fstrim service
2023-09-12 16:39:45 +02:00
b120a7ca85
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
2a0254b684
Add encrypted munge key with agenix
2023-09-08 19:05:45 +02:00
e3e6e7662d
Remove unused large port hole in firewall
2023-09-08 18:22:48 +02:00
868f825e26
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
f231dc81f1
Allow only some ports for srun
2023-09-08 17:51:37 +02:00
a758eef354
Block ssfhead from reaching our slurm daemon
2023-09-08 17:36:28 +02:00
9c9c41fb57
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
1a1708f16f
Add IB and IPMI node host names
2023-09-08 13:21:37 +02:00
efe1b7e399
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
2023-09-07 11:13:45 +02:00
eb9876aff6
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
8d31c552f5
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27 ' (2023-08-31)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
→ 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2023-09-05 15:03:26 +02:00
68f4d54dd1
Add agenix to all nodes
2023-09-04 22:10:43 +02:00
2042d58b72
Add agenix module to ceph
2023-09-04 22:07:07 +02:00
2c8c90e6e4
Remove old secrets
2023-09-04 22:04:32 +02:00
208dcb7dde
Mount /ceph in owl1 and owl2
2023-09-04 22:00:36 +02:00
e2f82a6383
Warn about the owl2 omnipath device
2023-09-04 22:00:17 +02:00
d704816de9
Clean owl2 configuration
2023-09-04 21:59:56 +02:00
74ec4eb22a
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
0a5f9b55f5
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
900de39e2f
Add anavarro user
2023-09-04 16:00:01 +02:00
1e466d07df
Set zsh inc_append_history option
2023-09-03 16:57:53 +02:00
13807c5e8f
Set zsh shell for rarias
2023-09-03 16:46:27 +02:00