70 Commits

Author SHA1 Message Date
cf722cbf0d Add abonerib user to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:17 +02:00
9533cb008c Add users to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:17 +02:00
07f8a1763f Add dalvare1 user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:17 +02:00
2e5379b9bf Use IPMI host names instead of IP addresses
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:17 +02:00
364f05e7af Add new fox machine
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
539a72aad6 Add BSC machines to ssh config
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
574f2bdc4f Delay nix-gc until /home is mounted
Prevents starting the garbage collector before the remote FS are
mounted, in particular /home. Otherwise, all the gcroots which have
symlinks in /home will be considered stale and they will be removed.

See: #79
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
935553431e Add dbautist user with access to hut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
4489eac910 Set the serial console to ttyS1 in raccoon
Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
2182d117a8 Add 10 min shutdown jitter to avoid spikes
The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
123598cd03 Program shutdown for August 2nd for all machines
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
706b9049fa Allow ptrace to any process of the same user
Allows users to attach GDB to their own processes, without requiring
running the program with GDB from the start. It is only available in
compute nodes, the storage nodes continue with the restricted settings.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
aa330aaf63 Add abonerib user to hut, raccon, owl1 and owl2
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
857196efb7 Grant rpenacob access to owl1 and owl2 nodes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
cf0a41cbc4 Access private repositories via hut SSH proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
e81f66d0c4 Set the default proxy to point to hut
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
d93914f91d Move vlopez user to jungleUsers for koro host
Access to other machines can be easily added into the "hosts" attribute
without the need to replicate the configuration.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
f74ec2bce1 Split xeon specific configuration from base
To accomodate the raccoon knights workstation, some of the configuration
pulled by m/common/main.nix has to be removed. To solve it, the xeon
specific parts are placed into m/common/xeon.nix and only the common
configuration is at m/common/base.nix.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
ededd2083b Control user access to each machine
The users.jungleUsers configuration option behaves like the users.users
option, but defines the list attribute `hosts` for each user, which
filters users so that only the user can only access those hosts.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
35b4a30f2e Move slurm client in a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 16:40:16 +02:00
845adfc937 Use tmpfs in /tmp
The /tmp directory was using the SSD disk which is not erased across
boots. Nix will use /tmp to perform the builds, so we want it to be as
fast as possible. In general, all the machines have enough space to
handle large builds like LLVM.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
34628c0e39 BSC packages are no longer in bsc attribute
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
4e333dca21 Don't log SLURM connection attempts from ssfhead 2025-10-01 16:40:16 +02:00
beae9d240e Enable direnv integration 2025-10-01 16:40:16 +02:00
e925b00489 Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2025-10-01 16:40:16 +02:00
87871de141 Don't fetch registry flakes from the net 2025-10-01 16:40:16 +02:00
a992b266bb Enable fstrim service 2025-10-01 16:40:16 +02:00
1b5469af13 Add encrypted munge key with agenix 2025-10-01 16:40:16 +02:00
78c883a274 Remove unused large port hole in firewall 2025-10-01 16:40:16 +02:00
241b888a7c Allow only some ports for srun 2025-10-01 16:40:16 +02:00
b7aba3d15c Block ssfhead from reaching our slurm daemon 2025-10-01 16:40:16 +02:00
e35b51cd00 Poweroff idle slurm nodes after 1 hour 2025-10-01 16:40:16 +02:00
2e460f49bd Add IB and IPMI node host names 2025-10-01 16:40:16 +02:00
e7aa2d3fe3 Add agenix to all nodes 2025-10-01 16:40:16 +02:00
224bafd20d Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2025-10-01 16:40:16 +02:00
8b1fa938ea Add anavarro user 2025-10-01 16:40:16 +02:00
94b110dc57 Set zsh inc_append_history option 2025-10-01 16:40:16 +02:00
0c5207bd2d Set zsh shell for rarias 2025-10-01 16:40:16 +02:00
4200e6162d Enable zsh and fix key bindings 2025-10-01 16:40:16 +02:00
c0ae33fbb5 Keep a log over time with the config commits 2025-10-01 16:40:16 +02:00
ff00c2be8d Store nixos config in /etc/nixos/config.rev 2025-10-01 16:40:16 +02:00
ad491140f4 Enable watchdog 2025-10-01 16:40:16 +02:00
9f71aae1ff Also enable monitoring in lake2 2025-10-01 16:40:16 +02:00
a48ae143cc Move pkgs overlay to overlay.nix 2025-10-01 16:40:16 +02:00
5f2fe97cd4 Add the lake2 hostname to the hosts 2025-10-01 16:40:16 +02:00
71000731c0 Add ceph metrics to prometheus 2025-10-01 16:40:16 +02:00
503a63539c Add the bay host name 2025-10-01 16:40:16 +02:00
8cb7cf087c Don't set all_proxy 2025-10-01 16:40:15 +02:00
d92e06d7b7 GRUB version no longer needed 2025-10-01 16:40:15 +02:00
a096a386a0 Kill slurmd remaining processes on upgrade 2025-10-01 16:40:15 +02:00