1478 Commits

Author SHA1 Message Date
dab6f08d89 Set nixpkgs to track nixos-24.11
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:13 +01:00
8190523c30 Add script to monitor GPFS
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:07 +01:00
d335d69ba6 Add BSC machines to ssh config
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:51 +01:00
cec49eb5fc Collect statistics from logged users
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:48 +01:00
22db38c98f Add custom GPFS exporter for MN5
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:46 +01:00
0d4eebbb59 Remove exception to fetch task endpoint
It causes the request to go to the website rather than the Gitea
service.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:43 +01:00
025f6a0c0c Use SSD for boot, then switch to NVME
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:40 +01:00
abc74c5445 Use NVME as root
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:37 +01:00
6942f09f69 Keep host header for Grafana requests
This was breaking requests due to CSRF check.

See: https://github.com/grafana/grafana/issues/45117#issuecomment-1033842787
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:32 +01:00
56f6855af7 Ignore logging requests from the gitea runner
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:28 +01:00
81c822e68e Log the client IP not the proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:22 +01:00
53e80b1f19 Ignore misc directory
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:19 +01:00
21feb01e7b Create paste directories in /ceph/p
Ensure that all hut users have a paste directory in /ceph/p owned by
themselves. We need to wait for the ceph mount point to create them, so
we use a systemd service that waits for the remote-fs.target.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:16 +01:00
9ea7b2b475 Add p command to paste files
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:10 +01:00
fce4d89e1d Use nginx to serve website and other services
Instead of using multiple tunels to forward all our services to the VM
that serves jungle.bsc.es, just use nginx to redirect the traffic from
hut. This allows adding custom rules for paths that are not posible
otherwise.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:07 +01:00
6b282375f8 Mount the NVME disk in /nvme
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:22:58 +01:00
6782fc6c5b Add cacheline parameter to nOS-V
By default it is set to 64 bits. The cacheline parameter is required
when cross-compiling nOS-V, as it cannot be read from the build machine.

Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:16:03 +01:00
73550ad5a9 Remove unneeded NODES dependencies
The autoreconfHook helper already provides autotools binaries. Also NODES
no longer uses papi.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:16:03 +01:00
48d67ef6c2 Fix NODES native dependencies
Move NODES build tools to nativeBuildInputs. This is needed for
cross-compilation, given that build tools must much the build system.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:16:03 +01:00
Raúl Peñacoba
73e30d20e9 Python is needed in openmp now
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:09:27 +01:00
5f85082553 Update sonar to 1.0.1
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:10:55 +01:00
46f15ac201 Update LLVM to 2024.11
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:10:52 +01:00
b442ddf1a4 Update Nanos6 to 4.2
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:10:50 +01:00
b006538147 Update TAMPI to 4.0
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:09:15 +01:00
995aa0b2e2 Update NODES to 1.3
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:09:12 +01:00
896ec0ad0f Update nOS-V to 3.1.0
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:09:09 +01:00
2d9d2701a9 Update ovni to 1.11.0
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:08:50 +01:00
74e11db8b6 Only enable MPI in ovni on native builds
Tested with:

hut% nix build .#bsc-ci.all
hut% nix build .#pkgsCross.riscv64.ovni

Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Reviewed-by: Aleix Boné <aleix.boneribo@bsc.es>
2024-10-28 13:42:19 +01:00
e046363e52 nos-v: fix cross compilation
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-10-28 13:40:35 +01:00
aa3f816388 ovni: fix cross compilation
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-10-28 13:40:33 +01:00
3eff2662bb paraver: install manpages
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-10-28 13:39:52 +01:00
260986b9f2 Delay nix-gc until /home is mounted
Prevents starting the garbage collector before the remote FS are
mounted, in particular /home. Otherwise, all the gcroots which have
symlinks in /home will be considered stale and they will be removed.

See: #79
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-09-20 09:45:30 +02:00
15afbe94bd Add dbautist user with access to hut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-09-20 09:42:02 +02:00
efd35a9cd1 Set the serial console to ttyS1 in raccoon
Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:56 +02:00
50ad1d637c Remove setLdLibraryPath and driSupport options
They have been removed from NixOS. The "hardware.opengl" group is now
renamed to "hardware.graphics".

See: 98cef4c273
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:53 +02:00
c299d53146 Add documentation section about GRUB chain loading
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:47 +02:00
152b71e718 Add 10 min shutdown jitter to avoid spikes
The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:44 +02:00
0911d5b92a Don't mount the nix store in owl nodes
Initially we planned to run jobs in those nodes by sharing the same nix
store from hut. However, these nodes are now used to build packages
which are not available in hut. Users also ssh to the nodes, which
doesn't mount the hut store, so it doesn't make much sense to keep
mounting it.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:42 +02:00
5ddae068af Emulate other architectures in owl nodes too
Allows cross-compilation of packages for RISC-V that are known to try to
run RISC-V programs in the host.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:39 +02:00
d17be714ec Program shutdown for August 2nd for all machines
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:36 +02:00
28ce15d74d Enable debuginfod daemon in owl nodes
WARNING: This will introduce noise, as the daemon wakes up from time to
time to check for new packages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:30 +02:00
504f9bb570 Set gitea and grafana log level to warn
Prevents filling the journal logs with information messages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:27 +02:00
f158cb63e8 Set default SLURM job time limit to one hour
Prevents enless jobs from being left forever, while allow users to
request a larger time limit.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:24 +02:00
8860f76cad Allow other jobs to run in unused cores
The current select mechanism was using the memory too as a consumable
resource, which by default only sets 1 MiB per node. As each job already
requests 1 MiB, it prevents other jobs from running.

As we are not really concerned with memory usage, we only use the unused
cores in the select criteria.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:22 +02:00
b86798cd69 Use authentication tokens for PM GitLab runner
Starting with GitLab 16, there is a new mechanism to authenticate the
runners via authentication tokens, so use it instead.  Older tokens and
runners are also removed, as they are no longer used.

With the new way of managing tokens, both the tags and the locked state
are managed from the GitLab web page.

See: https://docs.gitlab.com/ee/ci/runners/new_creation_workflow.html
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:16 +02:00
7ed74931cf flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/1381a759b205dff7a6818733118d02253340fd5e' (2024-04-02)
  → 'github:ryantm/agenix/de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6' (2024-07-09)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/6143fc5eeb9c4f00163267708e26191d1e918932' (2024-04-21)
  → 'github:NixOS/nixpkgs/693bc46d169f5af9c992095736e82c3488bf7dbb' (2024-07-14)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:13 +02:00
6e9d33b483 Allow ptrace to any process of the same user
Allows users to attach GDB to their own processes, without requiring
running the program with GDB from the start. It is only available in
compute nodes, the storage nodes continue with the restricted settings.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:09 +02:00
58abaefbc4 Add abonerib user to hut, raccon, owl1 and owl2
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:07 +02:00
5ea7827a8a Grant rpenacob access to owl1 and owl2 nodes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:05 +02:00
b17e4a13f9 Access private repositories via hut SSH proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:03 +02:00