64 Commits

Author SHA1 Message Date
c4e12872d9 Delay nix-gc until /home is mounted
Prevents starting the garbage collector before the remote FS are
mounted, in particular /home. Otherwise, all the gcroots which have
symlinks in /home will be considered stale and they will be removed.

See: #79
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
7c381b2b65 Add dbautist user with access to hut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
92482721b4 Set the serial console to ttyS1 in raccoon
Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
6f22f683a9 Add 10 min shutdown jitter to avoid spikes
The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
f1373e5227 Program shutdown for August 2nd for all machines
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
90d44a95eb Allow ptrace to any process of the same user
Allows users to attach GDB to their own processes, without requiring
running the program with GDB from the start. It is only available in
compute nodes, the storage nodes continue with the restricted settings.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
4873c881a9 Add abonerib user to hut, raccon, owl1 and owl2
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
d6d7516e12 Grant rpenacob access to owl1 and owl2 nodes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
ae96be6915 Access private repositories via hut SSH proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
bb10c47c2e Set the default proxy to point to hut
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 16:40:16 +02:00
338551ec0a Move vlopez user to jungleUsers for koro host
Access to other machines can be easily added into the "hosts" attribute
without the need to replicate the configuration.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
c8604bf3e0 Split xeon specific configuration from base
To accomodate the raccoon knights workstation, some of the configuration
pulled by m/common/main.nix has to be removed. To solve it, the xeon
specific parts are placed into m/common/xeon.nix and only the common
configuration is at m/common/base.nix.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
970cdf8dbd Control user access to each machine
The users.jungleUsers configuration option behaves like the users.users
option, but defines the list attribute `hosts` for each user, which
filters users so that only the user can only access those hosts.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
0899424de9 Move slurm client in a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 16:40:16 +02:00
b8fbb6380e Use tmpfs in /tmp
The /tmp directory was using the SSD disk which is not erased across
boots. Nix will use /tmp to perform the builds, so we want it to be as
fast as possible. In general, all the machines have enough space to
handle large builds like LLVM.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
388a10b666 BSC packages are no longer in bsc attribute
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-10-01 16:40:16 +02:00
9957a0269d Don't log SLURM connection attempts from ssfhead 2025-10-01 16:40:16 +02:00
6d8fd353d0 Enable direnv integration 2025-10-01 16:40:16 +02:00
642507b255 Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2025-10-01 16:40:16 +02:00
0ca0da9ffe Don't fetch registry flakes from the net 2025-10-01 16:40:16 +02:00
627c912b87 Enable fstrim service 2025-10-01 16:40:16 +02:00
79446cebcb Add encrypted munge key with agenix 2025-10-01 16:40:16 +02:00
061fc60939 Remove unused large port hole in firewall 2025-10-01 16:40:16 +02:00
a6324e47e8 Allow only some ports for srun 2025-10-01 16:40:16 +02:00
2f258e1cdd Block ssfhead from reaching our slurm daemon 2025-10-01 16:40:16 +02:00
4c88f9a783 Poweroff idle slurm nodes after 1 hour 2025-10-01 16:40:16 +02:00
01140353c6 Add IB and IPMI node host names 2025-10-01 16:40:16 +02:00
6850bf3a71 Add agenix to all nodes 2025-10-01 16:40:16 +02:00
8a027d8b09 Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2025-10-01 16:40:16 +02:00
1f32b8409a Add anavarro user 2025-10-01 16:40:16 +02:00
bc51564a88 Set zsh inc_append_history option 2025-10-01 16:40:16 +02:00
8ba4f910c3 Set zsh shell for rarias 2025-10-01 16:40:16 +02:00
515fa49ed0 Enable zsh and fix key bindings 2025-10-01 16:40:16 +02:00
c63fa494d5 Keep a log over time with the config commits 2025-10-01 16:40:16 +02:00
a6d3f43b98 Store nixos config in /etc/nixos/config.rev 2025-10-01 16:40:16 +02:00
409efacf5b Enable watchdog 2025-10-01 16:40:16 +02:00
9241bda0ac Also enable monitoring in lake2 2025-10-01 16:40:16 +02:00
3b823ee478 Move pkgs overlay to overlay.nix 2025-10-01 16:40:16 +02:00
80efd57a11 Add the lake2 hostname to the hosts 2025-10-01 16:40:16 +02:00
91270b26bb Add ceph metrics to prometheus 2025-10-01 16:40:16 +02:00
9cd013c4ed Add the bay host name 2025-10-01 16:40:16 +02:00
cd6e6de2ad Don't set all_proxy 2025-10-01 16:40:15 +02:00
d8e366b444 GRUB version no longer needed 2025-10-01 16:40:15 +02:00
8c1bf6db42 Kill slurmd remaining processes on upgrade 2025-10-01 16:40:15 +02:00
cbe53a6f0a Add koro node 2025-10-01 16:40:15 +02:00
9097811cc0 Enable NTP using the BSC time server 2025-10-01 16:40:15 +02:00
83acd40880 Add the ssfhead node as gateway 2025-10-01 16:40:15 +02:00
ba75bf8249 Use our host names first by default 2025-10-01 16:40:15 +02:00
e9845cc76a Add DNS tools to resolve hosts 2025-10-01 16:40:15 +02:00
d5951483ee Lower perf_event_paranoid to -1 2025-10-01 16:40:15 +02:00