188ba6df0a 
							
						 
					 
					
						
						
							
							Remove bscpkgs input  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-10-07 16:07:26 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e42058f08b 
							
						 
					 
					
						
						
							
							Allow access to hut from fox  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-10-02 17:03:21 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f3bfe89f27 
							
						 
					 
					
						
						
							
							Fetch website from its own git repository  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-10-02 15:45:21 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							b040bebd1d 
							
						 
					 
					
						
						
							
							Add acinca user  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-10-01 12:27:43 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f69629d2da 
							
						 
					 
					
						
						
							
							Restart slurmd on failure  
						
						... 
						
						
						
						A failure to reach the control node can cause slurmd to fail and the
unit remains in the failed state until is manually restarted. Instead,
try to restart the service every 30 seconds, forever:
    owl1% systemctl show slurmd | grep -E 'Restart=|RestartUSec='
    Restart=on-failure
    RestartUSec=30s
    owl1% pgrep slurmd
    5903
    owl1% sudo kill -SEGV 5903
    owl1% pgrep slurmd
    6137
Fixes: #177 
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-30 17:20:39 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							0668f0db74 
							
						 
					 
					
						
						
							
							Lower connect timeout when using hut substituter  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-09-29 18:44:48 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							5fcd57a061 
							
						 
					 
					
						
						
							
							Use hut substituter in all nodes  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-09-29 18:44:38 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							ad1544759f 
							
						 
					 
					
						
						
							
							Remove machine access for user csiringo  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-09-29 18:23:24 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e1c950a530 
							
						 
					 
					
						
						
							
							Mount apex /home via NFS in raccoon  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-26 12:28:53 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f9632c37f8 
							
						 
					 
					
						
						
							
							Remove extra SSH jump configuration  
						
						... 
						
						
						
						We now have direct visibility among nodes so we don't need any extra
SSH configuration to reach them.
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-26 12:28:51 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							1f0cb4ae76 
							
						 
					 
					
						
						
							
							Add raccoon peer to wireguard  
						
						... 
						
						
						
						It routes traffic from fox, apex and the compute nodes so that we can
reach the git servers and tent.
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-26 12:28:48 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e98fdb89ab 
							
						 
					 
					
						
						
							
							Restrict fox peer to a single IP  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-26 12:28:43 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							6afe05b5fd 
							
						 
					 
					
						
						
							
							Use lowercase peer hostnames  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-26 12:28:25 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							7d5aebf882 
							
						 
					 
					
						
						
							
							Share a public folder for documents  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-19 10:59:40 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							4da7780472 
							
						 
					 
					
						
						
							
							Add amd_hsmp module in fox for AMD uProf  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-19 10:54:24 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							d6126501ba 
							
						 
					 
					
						
						
							
							Disable NMI watchdog in fox  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-19 10:54:17 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e6e4846529 
							
						 
					 
					
						
						
							
							Add AMD uProf module and enable it in fox  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-09-19 10:54:05 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							ff0fc18d0a 
							
						 
					 
					
						
						
							
							Mount home via NFS from apex in fox  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 15:34:02 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							19c7e32678 
							
						 
					 
					
						
						
							
							Allow access to NFS via wireguard subnet  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 15:33:47 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							017c19e7d0 
							
						 
					 
					
						
						
							
							Use 10.106.0.0/24 subnet to avoid collisions  
						
						... 
						
						
						
						The 106 byte is the code for 'j' (jungle) in ASCII:
	% printf j | od -t d
	0000000         106
	0000001
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:03:13 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							a36eff8749 
							
						 
					 
					
						
						
							
							Revert "Remove pam_slurm_adopt from fox"  
						
						... 
						
						
						
						This reverts commit 1eac0fcad8211195499bc566e6c70312b31af700.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:03:06 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							df17b11458 
							
						 
					 
					
						
						
							
							Enable fail2ban in fox  
						
						... 
						
						
						
						Protect fox against ssh bruteforce attacks:
fox% sudo lastb | head
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:24 - 11:24  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:24 - 11:24  (00:00)
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:03:02 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							0dc7b7eb3d 
							
						 
					 
					
						
						
							
							Accept connections from apex to fox slurmd  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:03:00 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							dff6eaf587 
							
						 
					 
					
						
						
							
							Accept fox connection to slurm controller  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:02:59 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							4b6b67b587 
							
						 
					 
					
						
						
							
							Add fox machine to SLURM  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:02:57 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							6bbfb0d124 
							
						 
					 
					
						
						
							
							Make apex host specific to each machine  
						
						... 
						
						
						
						Allows direct contact via the VPN when accessing from fox, but use
Internet when using the rest of the machines.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:02:49 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							46d03d5ca7 
							
						 
					 
					
						
						
							
							Add local host fox in apex  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:02:46 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e366e6ce87 
							
						 
					 
					
						
						
							
							Enable wireguard in apex  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:02:43 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e415f70bbb 
							
						 
					 
					
						
						
							
							Add wireguard server in fox  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-09-03 12:02:38 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							200c727bbf 
							
						 
					 
					
						
						
							
							Use writeShellScript for suspend.sh and resume.sh  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-08-29 12:35:28 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							7413021440 
							
						 
					 
					
						
						
							
							Add firewall rules to slurm server  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-08-29 12:35:26 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							20b4805335 
							
						 
					 
					
						
						
							
							Remove hut from slurm  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-08-29 12:35:24 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f7dff9deab 
							
						 
					 
					
						
						
							
							Only configure apex as slurm server  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-08-29 12:35:22 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f569933732 
							
						 
					 
					
						
						
							
							Split slurm configuration for client and server  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-08-29 12:35:20 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							ee895d2e4f 
							
						 
					 
					
						
						
							
							Move slurm control server to apex  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-08-29 12:35:16 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							5ee8623af2 
							
						 
					 
					
						
						
							
							Fix typo in csiringo ssh key  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-08-27 17:44:20 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							a0e4b209b0 
							
						 
					 
					
						
						
							
							Enable nix-ld in weasel  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-08-27 16:19:34 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							ce25867421 
							
						 
					 
					
						
						
							
							Add csiringo user with access to apex and weasel  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-08-27 16:02:26 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f89bba35a6 
							
						 
					 
					
						
						
							
							Access gitlab via raccoon in fox  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> 
						
						
					 
					
						2025-08-27 15:27:38 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							d591721a61 
							
						 
					 
					
						
						
							
							Move StartLimit* options to unit section  
						
						... 
						
						
						
						The StartLimitBurst and StartLimitIntervalSec options belong to the
[Unit] section, otherwise they are ignored in [Service]:
> Unknown key 'StartLimitIntervalSec' in section [Service], ignoring.
When using [Unit], the limits are properly set:
  apex% systemctl show power-policy.service | grep StartLimit
  StartLimitIntervalUSec=10min
  StartLimitBurst=10
  StartLimitAction=none
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-24 14:32:46 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							343b4f155e 
							
						 
					 
					
						
						
							
							Set power policy to always turn on  
						
						... 
						
						
						
						In all machines, as soon as we recover the power, turn the machine back
on. We cannot rely on the previous state as we will shut them down
before the power is cut to prevent damage on the power supply
monitoring circuit.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-24 11:22:38 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							39a211a846 
							
						 
					 
					
						
						
							
							Add NixOS module to control power policy  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-24 11:22:36 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							142985c505 
							
						 
					 
					
						
						
							
							Move August shutdown to 3rd at 22h  
						
						... 
						
						
						
						Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-24 11:22:33 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							3f3dc2d037 
							
						 
					 
					
						
						
							
							Disable automatic August shutdown for Fox  
						
						... 
						
						
						
						The UPC has different dates for the yearly power cut, and Fox can
recover properly from a power loss, so we don't need to have it turned
off before the power cut. Simply disabling the timer is enough.
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-24 11:22:10 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							3269d763aa 
							
						 
					 
					
						
						
							
							Add cudainfo program to test CUDA  
						
						... 
						
						
						
						The cudainfo program checks that we can initialize the CUDA RT library
and communicate with the driver. It can be used as standalone program or
built with cudainfo.gpuCheck so it is executed inside the build sandbox
to see if it also works fine. It uses the autoAddDriverRunpath hook to
inject in the runpath the location of the library directory for CUDA
libraries.
Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-23 11:52:09 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f2d8ee8552 
							
						 
					 
					
						
						
							
							Add missing symlink in cuda sandbox  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-23 11:51:47 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							8d984a0672 
							
						 
					 
					
						
						
							
							Enable cuda systemFeature in raccoon and fox  
						
						... 
						
						
						
						This allows running derivations which depend on cuda runtime without
breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];`
to the derivation.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-07-22 17:07:13 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							f3733418b2 
							
						 
					 
					
						
						
							
							Move shared nvidia settings to a separate module  
						
						... 
						
						
						
						Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-07-22 17:06:45 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							ce8b05b142 
							
						 
					 
					
						
						
							
							Replace xeon07 by hut in ssh config  
						
						... 
						
						
						
						The xeon07 machine has been renamed to hut.
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es> 
						
						
					 
					
						2025-07-21 18:10:08 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							4a5787e0c6 
							
						 
					 
					
						
						
							
							Enable automatic Nix GC in raccoon  
						
						... 
						
						
						
						Reviewed-by: Aleix Boné <abonerib@bsc.es> 
						
						
					 
					
						2025-07-21 17:58:26 +02:00