206 Commits

Author SHA1 Message Date
2703cd456d flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538' (2023-10-31)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
9325c203fb Switch bscpkgs URL to sourcehut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
5d1f008199 Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
f9622b19ef Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
f2d26fd2e2 Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
bf8f0ac583 Make blackbox exporter use the proxy
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
4e333dca21 Don't log SLURM connection attempts from ssfhead 2025-10-01 16:40:16 +02:00
33c1da6c40 Add docker runner too 2025-10-01 16:40:16 +02:00
5dc41a86e5 Monitor gitlab.bsc.es too 2025-10-01 16:40:16 +02:00
697c3d884e Monitor PM webpage via blackbox 2025-10-01 16:40:16 +02:00
5a537c7478 Temporarily disable pm runners 2025-10-01 16:40:16 +02:00
c06b706e49 Add runner for gitlab.bsc.es 2025-10-01 16:40:16 +02:00
270cff123d Allow anonymous access to grafana 2025-10-01 16:40:16 +02:00
b219badaaf Remove user/group when using DynamicUsers 2025-10-01 16:40:16 +02:00
f4fcb7c72c Set the SLURM_CONF variable 2025-10-01 16:40:16 +02:00
b4ede66387 Enable slurm-exporter service 2025-10-01 16:40:16 +02:00
ec351a157c Add prometheus-slurm-exporter package 2025-10-01 16:40:16 +02:00
63d63fd39a Mount the hut nix store for SLURM jobs 2025-10-01 16:40:16 +02:00
beae9d240e Enable direnv integration 2025-10-01 16:40:16 +02:00
e925b00489 Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2025-10-01 16:40:16 +02:00
5594e3615d Add bscpkgs and nixpkgs top level attributes
Allows the evaluation of packages of the intermediate overlays.
2025-10-01 16:40:16 +02:00
384c4ee766 Use hut packages as the default package set
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2025-10-01 16:40:16 +02:00
87871de141 Don't fetch registry flakes from the net 2025-10-01 16:40:16 +02:00
c5a058f96a flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
2025-10-01 16:40:16 +02:00
79077477e1 Revert "Update slurm to 23.02.05.1"
This reverts commit aaefddc44a9073166ac52b8bd56ac96258d3b053.
2025-10-01 16:40:16 +02:00
f5a6055f21 Open ports in firewall of compute nodes 2025-10-01 16:40:16 +02:00
0333b57851 Update slurm to 23.02.05.1 2025-10-01 16:40:16 +02:00
00068cb11c Monitor storage nodes via IPMI too 2025-10-01 16:40:16 +02:00
a992b266bb Enable fstrim service 2025-10-01 16:40:16 +02:00
c26cff7bdb Serve the nix store from hut 2025-10-01 16:40:16 +02:00
1b5469af13 Add encrypted munge key with agenix 2025-10-01 16:40:16 +02:00
78c883a274 Remove unused large port hole in firewall 2025-10-01 16:40:16 +02:00
3385252f5f Make exporters listen in localhost only 2025-10-01 16:40:16 +02:00
241b888a7c Allow only some ports for srun 2025-10-01 16:40:16 +02:00
b7aba3d15c Block ssfhead from reaching our slurm daemon 2025-10-01 16:40:16 +02:00
e35b51cd00 Poweroff idle slurm nodes after 1 hour 2025-10-01 16:40:16 +02:00
2e460f49bd Add IB and IPMI node host names 2025-10-01 16:40:16 +02:00
a13a2caf57 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
2025-10-01 16:40:16 +02:00
ac3817d99b Unlock ovni gitlab runners 2025-10-01 16:40:16 +02:00
c1d9b01ed1 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27' (2023-08-31)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
  → 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2025-10-01 16:40:16 +02:00
e7aa2d3fe3 Add agenix to all nodes 2025-10-01 16:40:16 +02:00
d2860ce437 Add agenix module to ceph 2025-10-01 16:40:16 +02:00
1f199c73f1 Remove old secrets 2025-10-01 16:40:16 +02:00
657a1b328a Mount /ceph in owl1 and owl2 2025-10-01 16:40:16 +02:00
875e6fe6c7 Warn about the owl2 omnipath device 2025-10-01 16:40:16 +02:00
7abee55da4 Clean owl2 configuration 2025-10-01 16:40:16 +02:00
758ddc71cb Move the ceph client config to an external module 2025-10-01 16:40:16 +02:00
224bafd20d Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2025-10-01 16:40:16 +02:00
8b1fa938ea Add anavarro user 2025-10-01 16:40:16 +02:00
94b110dc57 Set zsh inc_append_history option 2025-10-01 16:40:16 +02:00