35a94a9b02
Enable runners for pm.bsc.es/gitlab too
...
The old runners for the PM gitlab were disabled in configuration in the
last outage, but they remained working until we reboot the node. With
this change we enable the runners for both PM and gitlab.bsc.es.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 14:45:23 +01:00
b6bd31e159
Remove complete ceph package from hut
...
Only the ceph-client is needed.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:58:54 +01:00
1d4badda5b
Fix warning in slurm exporter using vendorHash
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:58:50 +01:00
bd5214a3b9
Remove old Ceph package overlay
...
The Ceph package is now integrated in upstream nixpkgs.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:58:47 +01:00
c32f6dea97
flake.lock: Update
...
Flake lock file updates:
• Updated input 'agenix':
'github:ryantm/agenix/d8c973fd228949736dedf61b7f8cc1ece3236792' (2023-07-24)
→ 'github:ryantm/agenix/daf42cb35b2dc614d1551e37f96406e4c4a2d3e4' (2023-10-08)
• Updated input 'bscpkgs':
'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538 ' (2023-10-31)
→ 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=e148de50d68b3eeafc3389b331cf042075971c4b ' (2023-11-22)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
→ 'github:NixOS/nixpkgs/e4ad989506ec7d71f7302cc3067abd82730a4beb' (2023-11-19)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:57:44 +01:00
dd341902fc
BSC packages are no longer in bsc attribute
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
190e273112
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80 ' (2023-09-14)
→ 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538 ' (2023-10-31)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
268807d1d0
Switch bscpkgs URL to sourcehut
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
2953080fb8
Monitor anella instead of gw.bsc.es
...
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-27 12:46:08 +02:00
9871517be2
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
736eacaac5
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
0e66aad099
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
67a4905a0a
Don't log SLURM connection attempts from ssfhead
2023-10-06 15:22:04 +02:00
d52d22e0db
Add docker runner too
2023-10-06 15:17:07 +02:00
42920c2521
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
4acd35e036
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
621d20db3a
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
0926f6ec1f
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
61646cb3bd
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
c0066c4744
Remove user/group when using DynamicUsers
2023-09-22 10:13:06 +02:00
ffd0593f51
Set the SLURM_CONF variable
2023-09-21 22:22:00 +02:00
f49ae0773e
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
8fa3fccecb
Add prometheus-slurm-exporter package
2023-09-21 21:34:18 +02:00
9ee7111453
Document the hut shared nix store for SLURM
2023-09-21 13:51:42 +02:00
8de3d2b149
Mount the hut nix store for SLURM jobs
2023-09-20 19:38:43 +02:00
bc62e28ca3
Enable direnv integration
2023-09-20 09:32:58 +02:00
d612a5453c
Add System Integration Service Guide document
2023-09-19 15:12:59 +02:00
653d411b9e
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
51c57dbc41
Add bscpkgs and nixpkgs top level attributes
...
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 12:00:33 +02:00
33cd40160e
Use hut packages as the default package set
...
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-15 12:00:28 +02:00
a1e8cfea47
Don't fetch registry flakes from the net
2023-09-15 12:00:28 +02:00
5d72ee3da3
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80 ' (2023-09-14)
2023-09-15 11:50:47 +02:00
fdc6445d47
Revert "Update slurm to 23.02.05.1"
...
This reverts commit aaefddc44a
.
2023-09-14 15:46:18 +02:00
e88805947e
Open ports in firewall of compute nodes
2023-09-14 15:45:43 +02:00
aaefddc44a
Update slurm to 23.02.05.1
2023-09-13 17:44:24 +02:00
d9d249411d
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
c07f75c6bb
Specify the space available in /ceph
2023-09-13 14:19:59 +02:00
8d449ba20c
Add update post to website
2023-09-12 18:13:38 +02:00
10ca572aec
Enable fstrim service
2023-09-12 16:39:45 +02:00
75b0f48715
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
19a451db77
Add encrypted munge key with agenix
2023-09-08 19:05:45 +02:00
ec9be9bb62
Remove unused large port hole in firewall
2023-09-08 18:22:48 +02:00
7ddd1977f3
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
7050c505b5
Allow only some ports for srun
2023-09-08 17:51:37 +02:00
033a1fe97b
Block ssfhead from reaching our slurm daemon
2023-09-08 17:36:28 +02:00
77cb3c494e
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
6db5772ac4
Add IB and IPMI node host names
2023-09-08 13:21:37 +02:00
3e347e673c
flake.lock: Update
...
Flake lock file updates:
• Updated input 'bscpkgs':
'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f ' (2023-09-01)
→ 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906 ' (2023-09-07)
2023-09-07 11:13:45 +02:00
dca274d020
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
c33909f32f
Update email contact to jungle mail list
2023-09-05 16:10:58 +02:00