204 Commits

Author SHA1 Message Date
c307fc9bb3 Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-26 12:36:06 +02:00
6f5f234480 Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-24 11:49:42 +02:00
1e9bc4086f Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 16:04:15 +02:00
734f52e87f Make blackbox exporter use the proxy
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-20 15:34:06 +02:00
18908c3019 Don't log SLURM connection attempts from ssfhead 2023-10-04 08:19:09 +02:00
72658ee5e6 Add docker runner too 2023-10-04 07:55:26 +02:00
cfa3e08e4b Monitor gitlab.bsc.es too 2023-10-03 09:45:13 +02:00
10101c631d Monitor PM webpage via blackbox 2023-10-03 08:58:07 +02:00
4d865d7a7e Temporarily disable pm runners 2023-09-28 14:14:41 +02:00
d9511dab22 Add runner for gitlab.bsc.es 2023-09-28 14:11:30 +02:00
c3ecba513d Allow anonymous access to grafana 2023-09-22 10:50:14 +02:00
24c05e5ebf Remove user/group when using DynamicUsers 2023-09-22 10:13:06 +02:00
7aef154dd4 Set the SLURM_CONF variable 2023-09-21 22:18:30 +02:00
4ca4e0fae9 Enable slurm-exporter service 2023-09-21 21:38:34 +02:00
7b686d0ea4 Add prometheus-slurm-exporter package 2023-09-21 21:34:18 +02:00
d4c803dbfb Mount the hut nix store for SLURM jobs 2023-09-20 18:26:48 +02:00
94ead9b759 Enable direnv integration 2023-09-17 22:27:51 +02:00
e0b3dd961c Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 11:58:47 +02:00
656de00d65 Add bscpkgs and nixpkgs top level attributes
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 11:58:10 +02:00
fefdbe9c55 Use hut packages as the default package set
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-14 18:28:09 +02:00
c73a337471 Don't fetch registry flakes from the net 2023-09-15 09:13:24 +02:00
dbd57ed57f flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
2023-09-14 18:09:05 +02:00
010491618e Revert "Update slurm to 23.02.05.1"
This reverts commit aaefddc44a9073166ac52b8bd56ac96258d3b053.
2023-09-14 15:46:18 +02:00
722c0b0eaa Open ports in firewall of compute nodes 2023-09-14 15:45:43 +02:00
772e0f00fb Update slurm to 23.02.05.1 2023-09-13 17:44:24 +02:00
de3a28b7df Monitor storage nodes via IPMI too 2023-09-13 15:57:13 +02:00
a05d87d4b9 Enable fstrim service 2023-09-12 16:39:45 +02:00
826d6263fd Serve the nix store from hut 2023-09-12 12:19:43 +02:00
b0b04e8fb1 Add encrypted munge key with agenix 2023-09-08 19:01:57 +02:00
a5e81fea95 Remove unused large port hole in firewall 2023-09-08 18:22:48 +02:00
dd616a7fb1 Make exporters listen in localhost only 2023-09-08 18:13:04 +02:00
e41404f619 Allow only some ports for srun 2023-09-08 17:51:37 +02:00
1c7ce3fc51 Block ssfhead from reaching our slurm daemon 2023-09-08 17:20:32 +02:00
bdd03dac60 Poweroff idle slurm nodes after 1 hour 2023-09-08 13:31:23 +02:00
21b38de26d Add IB and IPMI node host names 2023-09-08 13:21:37 +02:00
52d3794b14 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
2023-09-07 11:13:45 +02:00
d91c9b7473 Unlock ovni gitlab runners 2023-09-05 16:24:27 +02:00
6b526f9827 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27' (2023-08-31)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
  → 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2023-09-05 15:03:26 +02:00
ae4ad95902 Add agenix to all nodes 2023-09-04 22:09:40 +02:00
3cc7b33c5a Add agenix module to ceph 2023-09-04 22:06:20 +02:00
8fc87885da Remove old secrets 2023-09-04 22:04:32 +02:00
1ea8912d6c Mount /ceph in owl1 and owl2 2023-09-04 22:00:36 +02:00
7d9e7e4e83 Warn about the owl2 omnipath device 2023-09-04 22:00:17 +02:00
779b591d40 Clean owl2 configuration 2023-09-04 21:59:56 +02:00
c13022596a Move the ceph client config to an external module 2023-09-04 21:59:04 +02:00
875622ad0f Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
a7eddecf80 Add anavarro user 2023-09-04 16:00:01 +02:00
fcddbdb72b Set zsh inc_append_history option 2023-09-03 16:57:53 +02:00
bfb5363d94 Set zsh shell for rarias 2023-09-03 16:46:27 +02:00
44c1d958f4 Enable zsh and fix key bindings 2023-09-03 11:51:53 +02:00