1512 Commits

Author SHA1 Message Date
20e7d244d1 Rekey secrets with trusted fox key
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:55 +02:00
c5d3b8e7f0 Trust fox for compute node secrets
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:52 +02:00
6bbfb0d124 Make apex host specific to each machine
Allows direct contact via the VPN when accessing from fox, but use
Internet when using the rest of the machines.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:49 +02:00
46d03d5ca7 Add local host fox in apex
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:46 +02:00
e366e6ce87 Enable wireguard in apex
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:43 +02:00
e415f70bbb Add wireguard server in fox
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:38 +02:00
200c727bbf Use writeShellScript for suspend.sh and resume.sh
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:28 +02:00
7413021440 Add firewall rules to slurm server
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:26 +02:00
20b4805335 Remove hut from slurm
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:24 +02:00
f7dff9deab Only configure apex as slurm server
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:22 +02:00
f569933732 Split slurm configuration for client and server
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:20 +02:00
ee895d2e4f Move slurm control server to apex
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:16 +02:00
5ee8623af2 Fix typo in csiringo ssh key
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 17:44:20 +02:00
a0e4b209b0 Enable nix-ld in weasel
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 16:19:34 +02:00
ce25867421 Add csiringo user with access to apex and weasel
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 16:02:26 +02:00
f89bba35a6 Access gitlab via raccoon in fox
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-27 15:27:38 +02:00
2c8d7ed855 Build nOS-V with PAPI support
To support the new instrumentation for HWC it would be useful to already
build nOS-V with PAPI support enabled. The enablePapi switch allows it
to be disabled with `nosv.override { enablePapi = false; }`.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-01 13:12:48 +02:00
d591721a61 Move StartLimit* options to unit section
The StartLimitBurst and StartLimitIntervalSec options belong to the
[Unit] section, otherwise they are ignored in [Service]:

> Unknown key 'StartLimitIntervalSec' in section [Service], ignoring.

When using [Unit], the limits are properly set:

  apex% systemctl show power-policy.service | grep StartLimit
  StartLimitIntervalUSec=10min
  StartLimitBurst=10
  StartLimitAction=none

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 14:32:46 +02:00
343b4f155e Set power policy to always turn on
In all machines, as soon as we recover the power, turn the machine back
on. We cannot rely on the previous state as we will shut them down
before the power is cut to prevent damage on the power supply
monitoring circuit.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:38 +02:00
39a211a846 Add NixOS module to control power policy
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:36 +02:00
142985c505 Move August shutdown to 3rd at 22h
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:33 +02:00
3f3dc2d037 Disable automatic August shutdown for Fox
The UPC has different dates for the yearly power cut, and Fox can
recover properly from a power loss, so we don't need to have it turned
off before the power cut. Simply disabling the timer is enough.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:10 +02:00
3269d763aa Add cudainfo program to test CUDA
The cudainfo program checks that we can initialize the CUDA RT library
and communicate with the driver. It can be used as standalone program or
built with cudainfo.gpuCheck so it is executed inside the build sandbox
to see if it also works fine. It uses the autoAddDriverRunpath hook to
inject in the runpath the location of the library directory for CUDA
libraries.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:52:09 +02:00
f2d8ee8552 Add missing symlink in cuda sandbox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:51:47 +02:00
8d984a0672 Enable cuda systemFeature in raccoon and fox
This allows running derivations which depend on cuda runtime without
breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];`
to the derivation.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:07:13 +02:00
f3733418b2 Move shared nvidia settings to a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:06:45 +02:00
1666c14a35 Remove dependency on wx from paraver kernel
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 13:00:23 +02:00
b29f03ba6e Fix boost >=1.87 in wxparaver
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 12:59:53 +02:00
ae2ef1d2df Add patch to paraver to prevent focus stealing
See: https://github.com/bsc-performance-tools/wxparaver/issues/18
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 12:59:39 +02:00
9a48ae45bb Update paraver to 4.12.0
Adds a new patch to fix libxml2: the m4 AM_PATH_XML2 macro has been
deprecated and is no longer included in the latest nixpkgs unstable.
Upstream recommends using `PKG_CHECK_MODULES` instead.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 12:57:33 +02:00
ce8b05b142 Replace xeon07 by hut in ssh config
The xeon07 machine has been renamed to hut.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 18:10:08 +02:00
974bb56dc3 Fix lmbench build issues with GCC 14
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 18:01:36 +02:00
88d4d8e317 Drop tagaspi from bench6
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 18:01:20 +02:00
885e04e446 Remove unused inputs from intel compiler 2023
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 18:00:39 +02:00
4a5787e0c6 Enable automatic Nix GC in raccoon
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:26 +02:00
6c11093033 Select proprietary NVIDIA driver in raccoon
The NVIDIA GTX 960 from 2016 has the Maxwell architecture, and NixOS
suggests using the proprietary driver for older than Turing:

> It is suggested to use the open source kernel modules on Turing or
> later GPUs (RTX series, GTX 16xx), and the closed source modules
> otherwise.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:21 +02:00
26f52aa27d Use gcc 13 for intel compiler 2023
Intel compiler for C++ (icpc) is not able to parse the location of C++
headers from the output of gcc 14, but works fine for gcc 13.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
52fe43bfe1 Fix PATH in intel compiler wrapper
We need to add the gcc in the PATH, but adding it directly to $PATH
doesn't work, as it will be restored to $path_backup before icc runs. So
for now we simply inject it to path_backup, but ideally we should find a
more robust solution.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
f0637b4569 Drop GPI-2 and TAGASPI
GPI-2 fails to build, which is needed for TAGASPI.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
6ddfea0a3a Use boost 1.86 for paraver
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 17:23:30 +02:00
e7adef1ffa Fix intel-compiler by ignoring broken symlinks
In the future, we may want to look if those symlinks are needed.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
e82d3c3b9f Upgrade OmpSs-2 LLVM to 2025.06
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
4442b6a706 Update TAMPI to 4.1
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
2d0b014dc7 Update Nanos6 to 4.3
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
867ba3ec5a Update nOS-V to 3.2.0
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
2cacc2b265 Update ovni to 1.12.0
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
e4abd8d8f6 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'path:/nix/store/2csx2kkb2hxyxhhmg2xs9jfyypikwwk6-source?lastModified=1736867362&narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D&rev=9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc' (2025-01-14)
  → 'path:/nix/store/zk8v61cpk1wprp9ld5ayc1g5fq4pdkwv-source?lastModified=1752436162&narHash=sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw%3D&rev=dfcd5b901dbab46c9c6e80b265648481aafb01f8' (2025-07-13)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
750504744f Enable open source NVidia driver in fox
It is recommended for newer versions.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:38 +02:00
c26ec1b6f1 Remove option allowUnfree from fox and raccoon
It is already set to true for all machines.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:21 +02:00
2ef32f773c Ban another scanner trying to connect via SSH
It is constantly spamming out logs:

apex# journalctl | grep 'Connection closed by 84.88.52.176' | wc -l
2255

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:49 +02:00