a402bc880c
Remove machine access for user csiringo
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-09-29 18:23:24 +02:00
9c3fbc0ec9
Mount apex /home via NFS in raccoon
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:53 +02:00
3f8e6b9fcd
Remove extra SSH jump configuration
...
We now have direct visibility among nodes so we don't need any extra
SSH configuration to reach them.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:51 +02:00
08e4dda6d2
Add raccoon peer to wireguard
...
It routes traffic from fox, apex and the compute nodes so that we can
reach the git servers and tent.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:48 +02:00
3380ec5e05
Restrict fox peer to a single IP
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:43 +02:00
e934a2bc9d
Use lowercase peer hostnames
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:25 +02:00
3387cbcc25
Share a public folder for documents
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:59:40 +02:00
ac5f4e4dca
Add amd_hsmp module in fox for AMD uProf
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:24 +02:00
cad88f92a8
Disable NMI watchdog in fox
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:17 +02:00
3ab0e13960
Add AMD uProf module and enable it in fox
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:05 +02:00
2ed881cd89
Mount home via NFS from apex in fox
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 15:34:02 +02:00
2a07df1d30
Allow access to NFS via wireguard subnet
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 15:33:47 +02:00
52380eae59
Use 10.106.0.0/24 subnet to avoid collisions
...
The 106 byte is the code for 'j' (jungle) in ASCII:
% printf j | od -t d
0000000 106
0000001
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:13 +02:00
3b16b41be3
Revert "Remove pam_slurm_adopt from fox"
...
This reverts commit 64a52801ed8d5c4a57650c2c434254a9986c1901.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:06 +02:00
ee481deffb
Enable fail2ban in fox
...
Protect fox against ssh bruteforce attacks:
fox% sudo lastb | head
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:25 - 11:25 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:24 - 11:24 (00:00)
root ssh:notty 200.124.28.102 Mon Sep 1 11:24 - 11:24 (00:00)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:02 +02:00
b1bad25008
Accept connections from apex to fox slurmd
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:00 +02:00
85f38e17a2
Accept fox connection to slurm controller
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:59 +02:00
08ab01b89c
Add fox machine to SLURM
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:57 +02:00
e7490858c6
Make apex host specific to each machine
...
Allows direct contact via the VPN when accessing from fox, but use
Internet when using the rest of the machines.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:49 +02:00
7606030135
Add local host fox in apex
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:46 +02:00
e55590f59e
Enable wireguard in apex
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:43 +02:00
c3da39c392
Add wireguard server in fox
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:38 +02:00
d3889b3339
Use writeShellScript for suspend.sh and resume.sh
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:28 +02:00
28540d8cf3
Add firewall rules to slurm server
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:26 +02:00
f847621ceb
Remove hut from slurm
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:24 +02:00
12fe43f95f
Only configure apex as slurm server
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:22 +02:00
0e8329eef3
Split slurm configuration for client and server
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:20 +02:00
df3b21b570
Move slurm control server to apex
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:16 +02:00
78df61d24a
Fix typo in csiringo ssh key
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 17:44:20 +02:00
8e7da73151
Enable nix-ld in weasel
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 16:19:34 +02:00
a7e17e40dc
Add csiringo user with access to apex and weasel
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 16:02:26 +02:00
0e8bd22347
Access gitlab via raccoon in fox
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-27 15:27:38 +02:00
d948f8b752
Move StartLimit* options to unit section
...
The StartLimitBurst and StartLimitIntervalSec options belong to the
[Unit] section, otherwise they are ignored in [Service]:
> Unknown key 'StartLimitIntervalSec' in section [Service], ignoring.
When using [Unit], the limits are properly set:
apex% systemctl show power-policy.service | grep StartLimit
StartLimitIntervalUSec=10min
StartLimitBurst=10
StartLimitAction=none
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 14:32:46 +02:00
8f7787e217
Set power policy to always turn on
...
In all machines, as soon as we recover the power, turn the machine back
on. We cannot rely on the previous state as we will shut them down
before the power is cut to prevent damage on the power supply
monitoring circuit.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:38 +02:00
30b9b23112
Add NixOS module to control power policy
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:36 +02:00
9a056737de
Move August shutdown to 3rd at 22h
...
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:33 +02:00
ac700d34a5
Disable automatic August shutdown for Fox
...
The UPC has different dates for the yearly power cut, and Fox can
recover properly from a power loss, so we don't need to have it turned
off before the power cut. Simply disabling the timer is enough.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:10 +02:00
9b681ab7ce
Add cudainfo program to test CUDA
...
The cudainfo program checks that we can initialize the CUDA RT library
and communicate with the driver. It can be used as standalone program or
built with cudainfo.gpuCheck so it is executed inside the build sandbox
to see if it also works fine. It uses the autoAddDriverRunpath hook to
inject in the runpath the location of the library directory for CUDA
libraries.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:52:09 +02:00
9ce394bffd
Add missing symlink in cuda sandbox
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:51:47 +02:00
8cd7b713ca
Enable cuda systemFeature in raccoon and fox
...
This allows running derivations which depend on cuda runtime without
breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];`
to the derivation.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:07:13 +02:00
8eed90d2bd
Move shared nvidia settings to a separate module
...
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:06:45 +02:00
aee54ef39f
Replace xeon07 by hut in ssh config
...
The xeon07 machine has been renamed to hut.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 18:10:08 +02:00
69f7ab701b
Enable automatic Nix GC in raccoon
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:26 +02:00
4c9bcebcdc
Select proprietary NVIDIA driver in raccoon
...
The NVIDIA GTX 960 from 2016 has the Maxwell architecture, and NixOS
suggests using the proprietary driver for older than Turing:
> It is suggested to use the open source kernel modules on Turing or
> later GPUs (RTX series, GTX 16xx), and the closed source modules
> otherwise.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:21 +02:00
86e7c72b9b
Enable open source NVidia driver in fox
...
It is recommended for newer versions.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:38 +02:00
a7dffc33b5
Remove option allowUnfree from fox and raccoon
...
It is already set to true for all machines.
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:21 +02:00
6765dba3e4
Ban another scanner trying to connect via SSH
...
It is constantly spamming out logs:
apex# journalctl | grep 'Connection closed by 84.88.52.176' | wc -l
2255
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:49 +02:00
0acfb7a8e0
Update weasel IPMI hostname for monitoring
...
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:21 +02:00
2bb3b2fc4a
Remove package ix as it is gone
...
Fails with: "error: ix has been removed from Nixpkgs, as the ix.io
pastebin has been offline since Dec. 2023".
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:06 +02:00
a6698e6a6b
Silently ban OpenVAS BSC scanner from apex
...
It is spamming our logs with refused connection lines:
apex% sudo journalctl -b0 | grep 'refused connection.*SRC=192.168.8.16' | wc -l
13945
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:40:41 +02:00