405 Commits

Author SHA1 Message Date
84c4b6b81c Switch bscpkgs URL to sourcehut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
19e195b894 Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-27 12:46:08 +02:00
54c2bd119f Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
e5d85c1b38 Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
f1486b84c1 Make blackbox exporter use the proxy
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
472f4b0334 Don't log SLURM connection attempts from ssfhead 2023-10-06 15:22:04 +02:00
425dca3e00 Add docker runner too 2023-10-06 15:17:07 +02:00
e4080cf931 Monitor gitlab.bsc.es too 2023-10-06 15:17:07 +02:00
fc9285f89d Monitor PM webpage via blackbox 2023-10-06 15:17:07 +02:00
fbe238f5b6 Temporarily disable pm runners 2023-10-06 15:17:07 +02:00
9874da566d Add runner for gitlab.bsc.es 2023-10-06 15:17:07 +02:00
ebc5c4d84f Allow anonymous access to grafana 2023-09-22 10:51:30 +02:00
8634a9e133 Remove user/group when using DynamicUsers 2023-09-22 10:13:06 +02:00
0ce79ed79e Set the SLURM_CONF variable 2023-09-21 22:22:00 +02:00
5f492ee1d7 Enable slurm-exporter service 2023-09-21 21:40:02 +02:00
9071a4de8b Add prometheus-slurm-exporter package 2023-09-21 21:34:18 +02:00
3040a803b2 Mount the hut nix store for SLURM jobs 2023-09-20 19:38:43 +02:00
70a9e855cf Enable direnv integration 2023-09-20 09:32:58 +02:00
aa64e9ef24 Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
ba2b74fd5a Add bscpkgs and nixpkgs top level attributes
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 12:00:33 +02:00
1ae5d9e25e Use hut packages as the default package set
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-15 12:00:28 +02:00
ff98ba47c4 Don't fetch registry flakes from the net 2023-09-15 12:00:28 +02:00
599b23ef52 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
2023-09-15 11:50:47 +02:00
8dbee06d1d Revert "Update slurm to 23.02.05.1"
This reverts commit 7bfd786c01c36131cd00b90fc6a9503fd1226578.
2023-09-14 15:46:18 +02:00
d522113cb9 Open ports in firewall of compute nodes 2023-09-14 15:45:43 +02:00
7bfd786c01 Update slurm to 23.02.05.1 2023-09-13 17:44:24 +02:00
5a5f4672cd Monitor storage nodes via IPMI too 2023-09-13 15:57:13 +02:00
2646ad4b70 Enable fstrim service 2023-09-12 16:39:45 +02:00
b120a7ca85 Serve the nix store from hut 2023-09-12 12:19:43 +02:00
2a0254b684 Add encrypted munge key with agenix 2023-09-08 19:05:45 +02:00
e3e6e7662d Remove unused large port hole in firewall 2023-09-08 18:22:48 +02:00
868f825e26 Make exporters listen in localhost only 2023-09-08 18:13:04 +02:00
f231dc81f1 Allow only some ports for srun 2023-09-08 17:51:37 +02:00
a758eef354 Block ssfhead from reaching our slurm daemon 2023-09-08 17:36:28 +02:00
9c9c41fb57 Poweroff idle slurm nodes after 1 hour 2023-09-08 16:49:53 +02:00
1a1708f16f Add IB and IPMI node host names 2023-09-08 13:21:37 +02:00
efe1b7e399 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
2023-09-07 11:13:45 +02:00
eb9876aff6 Unlock ovni gitlab runners 2023-09-05 16:59:45 +02:00
8d31c552f5 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27' (2023-08-31)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
  → 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2023-09-05 15:03:26 +02:00
68f4d54dd1 Add agenix to all nodes 2023-09-04 22:10:43 +02:00
2042d58b72 Add agenix module to ceph 2023-09-04 22:07:07 +02:00
2c8c90e6e4 Remove old secrets 2023-09-04 22:04:32 +02:00
208dcb7dde Mount /ceph in owl1 and owl2 2023-09-04 22:00:36 +02:00
e2f82a6383 Warn about the owl2 omnipath device 2023-09-04 22:00:17 +02:00
d704816de9 Clean owl2 configuration 2023-09-04 21:59:56 +02:00
74ec4eb22a Move the ceph client config to an external module 2023-09-04 21:59:04 +02:00
0a5f9b55f5 Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
900de39e2f Add anavarro user 2023-09-04 16:00:01 +02:00
1e466d07df Set zsh inc_append_history option 2023-09-03 16:57:53 +02:00
13807c5e8f Set zsh shell for rarias 2023-09-03 16:46:27 +02:00