27 Commits

Author SHA1 Message Date
08953f64fb Use extra- for substituters and trusted-public-keys
From the nix manual:

> A configuration setting usually overrides any previous value. However,
> for settings that take a list of items, you can prefix the name of the
> setting by extra- to append to the previous value.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-03 17:59:17 +02:00
b386d30380 Remove fox from SLURM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-26 11:43:16 +02:00
bbf09ab960 Add UPC temperature sensor monitoring
These sensors are part of their air quality measurements, which just
happen to be very close to our server room.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-26 11:24:12 +02:00
3b5781ba63 Add meteocat exporter
Allows us to track ambient temperature changes and estimate the
temperature delta between the server room and exterior temperature.
We should be able to predict when we would need to stop the machines due
to excesive temperature as summer approaches.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-23 15:40:09 +02:00
4ed53d4384 Use hut nix cache in owl1, owl2 and raccoon
For owl1 and owl2 directly connect to hut via LAN with HTTP, but for
raccoon pass via the proxy using jungle.bsc.es with HTTPS. There is no
risk of tampering as packages are signed.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-02-26 16:03:26 +01:00
db04825a11 Remove SLURM partition all
We no longer have homogeneous nodes so it doesn't make much sense to
allocate a mix of them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-07 16:17:32 +02:00
5683fe5be1 Adjust fox slurm config after disabling SMT
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-03-28 11:04:19 +01:00
8ff54219f6 Reject SSH connections without SLURM allocation
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-13 14:47:38 +01:00
b046baee48 Exclude fox from being suspended by slurm
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 15:02:18 +01:00
a0eae1feea Add new fox machine
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-11 12:56:30 +01:00
3c1be2d4b4 Emulate other architectures in owl nodes too
Allows cross-compilation of packages for RISC-V that are known to try to
run RISC-V programs in the host.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-19 17:53:10 +02:00
be802804d1 Set default SLURM job time limit to one hour
Prevents enless jobs from being left forever, while allow users to
request a larger time limit.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-18 11:44:01 +02:00
e1967ccda6 Allow other jobs to run in unused cores
The current select mechanism was using the memory too as a consumable
resource, which by default only sets 1 MiB per node. As each job already
requests 1 MiB, it prevents other jobs from running.

As we are not really concerned with memory usage, we only use the unused
cores in the select criteria.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-18 11:19:03 +02:00
cd3284d1b2 Split xeon specific configuration from base
To accomodate the raccoon knights workstation, some of the configuration
pulled by m/common/main.nix has to be removed. To solve it, the xeon
specific parts are placed into m/common/xeon.nix and only the common
configuration is at m/common/base.nix.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-06-03 09:20:11 +02:00
91a42375e3 Control user access to each machine
The users.jungleUsers configuration option behaves like the users.users
option, but defines the list attribute `hosts` for each user, which
filters users so that only the user can only access those hosts.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-06-06 14:06:33 +02:00
b5da1c6521 Remove nixseparatedebuginfod input
It has been integrated in nixpkgs, so is no longer required.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-03-14 16:44:21 +01:00
df5a5e1668 Move slurm client in a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-02-09 11:14:34 +01:00
1c6e5d8f82 Enable nixseparatedebuginfod module
The module is only enabled on Hut and Eudy because we noticed activity
on the debuginfod service even if no debug session was active.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-12-01 19:57:04 +01:00
4d833d2088 Remove complete ceph package from hut
Only the ceph-client is needed.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-20 12:57:31 +01:00
24c05e5ebf Remove user/group when using DynamicUsers 2023-09-22 10:13:06 +02:00
7aef154dd4 Set the SLURM_CONF variable 2023-09-21 22:18:30 +02:00
4ca4e0fae9 Enable slurm-exporter service 2023-09-21 21:38:34 +02:00
d4c803dbfb Mount the hut nix store for SLURM jobs 2023-09-20 18:26:48 +02:00
722c0b0eaa Open ports in firewall of compute nodes 2023-09-14 15:45:43 +02:00
ae4ad95902 Add agenix to all nodes 2023-09-04 22:09:40 +02:00
3cc7b33c5a Add agenix module to ceph 2023-09-04 22:06:20 +02:00
c13022596a Move the ceph client config to an external module 2023-09-04 21:59:04 +02:00