dcffeed542
Mount home via NFS from apex in fox
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 13:24:06 +02:00
a22d0d4135
Allow access to NFS via wireguard subnet
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 13:16:27 +02:00
7d4ebd8495
Use 10.106.0.0/24 subnet to avoid collisions
...
The 106 byte is the code for 'j' (jungle) in ASCII:
% printf j | od -t d
0000000 106
0000001
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 11:12:25 +02:00
3a917f75c7
Revert "Remove pam_slurm_adopt from fox"
...
This reverts commit 64a52801ed8d5c4a57650c2c434254a9986c1901.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-02 17:12:56 +02:00
7657b860a8
Enable fail2ban in fox
...
Protect fox against ssh bruteforce attacks:
fox% sudo lastb | head
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:24 - 11:24 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:24 - 11:24 (00:00)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-01 11:25:29 +02:00
50ae3ab4f0
Accept connections from apex to fox slurmd
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:55:53 +02:00
02e2470c1a
Accept fox connection to slurm controller
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:46:24 +02:00
3f67bc4a2e
Add fox machine to SLURM
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:40:43 +02:00
71a23ec68b
Rekey secrets with trusted fox key
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:39:28 +02:00
11f52da199
Trust fox for compute node secrets
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:35:51 +02:00
f1a98190b5
Make apex host specific to each machine
...
Allows direct contact via the VPN when accessing from fox, but use
Internet when using the rest of the machines.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:29:14 +02:00
2fbf3ee8b6
Add local host fox in apex
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 14:11:19 +02:00
dd4ad901df
Enable wireguard in apex
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 13:52:05 +02:00
c9669408c5
Add wireguard server in fox
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-29 13:38:47 +02:00
ddfb26be5a
Use writeShellScript for suspend.sh and resume.sh
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:02:12 +02:00
1b21a398a8
Add firewall rules to slurm server
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-27 12:59:21 +02:00
4d16e794cd
Remove hut from slurm
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-27 12:43:12 +02:00
38a45f20b4
Only configure apex as slurm server
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-27 12:37:21 +02:00
0cc76fc98d
Split slurm configuration for client and server
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-27 12:36:52 +02:00
70da186d15
Move slurm control server to apex
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-27 11:56:20 +02:00
d71831016e
Fix typo in csiringo ssh key
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 17:21:23 +02:00
0fb3cec09c
Enable nix-ld in weasel
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-16 16:20:40 +02:00
5ccfc2411f
Add csiringo user with access to apex and weasel
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 12:42:08 +02:00
dbb7e1fe36
Access gitlab via raccoon in fox
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-27 15:20:34 +02:00
d1f58a62f5
Move StartLimit* options to unit section
...
The StartLimitBurst and StartLimitIntervalSec options belong to the
[Unit] section, otherwise they are ignored in [Service]:
> Unknown key 'StartLimitIntervalSec' in section [Service], ignoring.
When using [Unit], the limits are properly set:
apex% systemctl show power-policy.service | grep StartLimit
StartLimitIntervalUSec=10min
StartLimitBurst=10
StartLimitAction=none
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 12:21:05 +02:00
0642df0bbd
Set power policy to always turn on
...
In all machines, as soon as we recover the power, turn the machine back
on. We cannot rely on the previous state as we will shut them down
before the power is cut to prevent damage on the power supply
monitoring circuit.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 15:25:47 +02:00
3d7e8b8a07
Add NixOS module to control power policy
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 14:07:06 +02:00
2e429bf09e
Move August shutdown to 3rd at 22h
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 13:42:57 +02:00
9e22760628
Disable automatic August shutdown for Fox
...
The UPC has different dates for the yearly power cut, and Fox can
recover properly from a power loss, so we don't need to have it turned
off before the power cut. Simply disabling the timer is enough.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 13:40:33 +02:00
8bb09dd061
Add cudainfo program to test CUDA
...
The cudainfo program checks that we can initialize the CUDA RT library
and communicate with the driver. It can be used as standalone program or
built with cudainfo.gpuCheck so it is executed inside the build sandbox
to see if it also works fine. It uses the autoAddDriverRunpath hook to
inject in the runpath the location of the library directory for CUDA
libraries.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-22 15:24:55 +02:00
f686797234
Add missing symlink in cuda sandbox
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:19:25 +02:00
6411a94f77
Enable cuda systemFeature in raccoon and fox
...
This allows running derivations which depend on cuda runtime without
breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];`
to the derivation.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-18 11:34:28 +02:00
7b61cfbe54
Move shared nvidia settings to a separate module
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-18 11:31:59 +02:00
4e1fd7b0e0
Replace xeon07 by hut in ssh config
...
The xeon07 machine has been renamed to hut.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-18 10:59:39 +02:00
4e24135d35
Enable automatic Nix GC in raccoon
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 13:43:58 +02:00
7131d82ba2
Select proprietary NVIDIA driver in raccoon
...
The NVIDIA GTX 960 from 2016 has the Maxwell architecture, and NixOS
suggests using the proprietary driver for older than Turing:
> It is suggested to use the open source kernel modules on Turing or
> later GPUs (RTX series, GTX 16xx), and the closed source modules
> otherwise.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 13:00:03 +02:00
e8cd0d9f58
Enable open source NVidia driver in fox
...
It is recommended for newer versions.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-17 11:32:35 +02:00
a9ba65cdca
Remove option allowUnfree from fox and raccoon
...
It is already set to true for all machines.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-17 11:26:27 +02:00
94f398e661
Ban another scanner trying to connect via SSH
...
It is constantly spamming out logs:
apex# journalctl | grep 'Connection closed by 84.88.52.176' | wc -l
2255
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-16 16:59:29 +02:00
387e1cada7
Update weasel IPMI hostname for monitoring
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 18:48:08 +02:00
c6cc2a7638
Remove merged MPICH patch
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-15 17:57:22 +02:00
29071a6020
Remove package ix as it is gone
...
Fails with: "error: ix has been removed from Nixpkgs, as the ix.io
pastebin has been offline since Dec. 2023".
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-15 17:50:12 +02:00
f59218c898
flake.lock: Update
...
Flake lock file updates:
• Updated input 'agenix':
'github:ryantm/agenix/f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41?narHash=sha256-b%2Buqzj%2BWa6xgMS9aNbX4I%2BsXeb5biPDi39VgvSFqFvU%3D' (2024-08-10)
→ 'github:ryantm/agenix/531beac616433bac6f9e2a19feb8e99a22a66baf?narHash=sha256-9P1FziAwl5%2B3edkfFcr5HeGtQUtrSdk/MksX39GieoA%3D' (2025-06-17)
• Updated input 'agenix/darwin':
'github:lnl7/nix-darwin/4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d?narHash=sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0%3D' (2023-11-24)
→ 'github:lnl7/nix-darwin/43975d782b418ebf4969e9ccba82466728c2851b?narHash=sha256-dyN%2BteG9G82G%2Bm%2BPX/aSAagkC%2BvUv0SgUw3XkPhQodQ%3D' (2025-04-12)
• Updated input 'agenix/home-manager':
'github:nix-community/home-manager/3bfaacf46133c037bb356193bd2f1765d9dc82c1?narHash=sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE%3D' (2023-12-20)
→ 'github:nix-community/home-manager/abfad3d2958c9e6300a883bd443512c55dfeb1be?narHash=sha256-YZCh2o9Ua1n9uCvrvi5pRxtuVNml8X2a03qIFfRKpFs%3D' (2025-04-24)
• Updated input 'bscpkgs':
'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f ' (2024-11-29)
→ 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=9d1944c658929b6f98b3f3803fead4d1b91c4405 ' (2025-06-11)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc?narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D' (2025-01-14)
→ 'github:NixOS/nixpkgs/dfcd5b901dbab46c9c6e80b265648481aafb01f8?narHash=sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw%3D' (2025-07-13)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-15 17:46:48 +02:00
871515a736
Upgrade nixpkgs to nixos 25.05
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-15 17:45:40 +02:00
ef65a49ed1
Silently ban OpenVAS BSC scanner from apex
...
It is spamming our logs with refused connection lines:
apex% sudo journalctl -b0 | grep 'refused connection.*SRC=192.168.8.16' | wc -l
13945
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:30:20 +02:00
061bd24453
Rotate anavarro password and SSH key
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:15:59 +02:00
0a876e7a83
Add weasel machine configuration
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 15:07:52 +02:00
ba425f6647
Remove extra flush commands on firewall stop
...
They are not needed as they are already flushed when the firewall
starts or stops.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-11 16:13:35 +02:00
5a4e7d2bdf
Prevent accidental use of nftables
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-11 16:12:44 +02:00
998a7f839d
Add proxy configuration for internal hosts
...
Access internal hosts via apex proxy. From the compute nodes we first
open an SSH connection to apex, and then tunnel it through the HTTP
proxy with netcat.
This way we allow reaching internal GitLab repositories without
requiring the user to have credentials in the remote host, while we can
use multiple remotes to provide redundancy.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-11 12:29:52 +02:00