Update to nixpkgs 25.11 (Xantusia) #218

abonerib · 2025-12-02T14:14:26+01:00

abonerib commented

2025-12-02 14:14:26 +01:00

Update to nixpkgs 25.11 (Xantusia)

Compiler changes:

LLVM has been updated to version 21. GCC remains at version 14. CMake was updated to version 4.

Broken:

mercurium (mcxx)

Update to nixpkgs 25.11 (Xantusia) - [NixOS 25.11 release announcement](https://nixos.org/blog/announcements/2025/nixos-2511/) - [NixOS release notes](https://nixos.org/manual/nixos/stable/release-notes.html#sec-release-25.11) - [nixpkgs release notes](https://nixos.org/manual/nixpkgs/stable/release-notes#sec-nixpkgs-release-25.11) Compiler changes: LLVM has been updated to version 21. GCC remains at version 14. CMake was updated to version 4. Broken: - mercurium (mcxx)

abonerib added 9 commits 2025-12-02 14:14:27 +01:00

Upgrade nixpkgs to nixos 25.11 3f46d04b20

llvmPackages_latest.tools.bintools* -> llvmPackages_latest.bintools* 547d4f9315

Set pyproject=true in buildPythonApplication bc74b7d42b

Fixes:
```
To build with setuptools as before, set `pyproject = true` and `build-system = [ setuptools ]
```

cudaPackages.cuda_cudart has a single output now 14ac26e2c8

See: https://github.com/NixOS/nixpkgs/pull/437723

'wrapGAppsHook' has been renamed to/replaced by 'wrapGAppsHook3' 97cbf5e30e

Fix renamed option for watchdog time 0bd4c83330

The option `systemd.watchdog.runtimeTime' defined in `/nix/store/m7h6slsq394m872xnhxsxqrkhndz1lqs-source/m/common/base/watchdog.nix' has been renamed to `systemd.settings.Manager.RuntimeWatchdogSec'.

Update nixpkgs 71628cc9ea

Use standard gcc for intel packages 4e64f54c0f

Upgrade to nixseparatedebuginfod2

CI / build:all (pull_request) Failing after 49m58s

Details

CI / build:cross (pull_request) Successful in 49m56s

Details

4545fbf08f

abonerib commented

2025-12-02 14:19:20 +01:00

Evaluation warnings when building hut:

evaluation warning: linuxPackages.perf is now perf
evaluation warning: Runner registration tokens have been deprecated and disabled by default in GitLab >= 17.0.
                    Consider migrating to runner authentication tokens by setting `services.gitlab-runner.services.gitlab-bsc-docker.authenticationTokenConfigFile`.
                    https://docs.gitlab.com/17.0/ee/ci/runners/new_creation_workflow.html
Done. The new configuration is /nix/store/nffq9ynzlrlx4m7phqgn621dcy3731xm-nixos-system-hut-25.11.20251130.8bb5646

Evaluation warnings when building hut: ``` evaluation warning: linuxPackages.perf is now perf evaluation warning: Runner registration tokens have been deprecated and disabled by default in GitLab >= 17.0. Consider migrating to runner authentication tokens by setting `services.gitlab-runner.services.gitlab-bsc-docker.authenticationTokenConfigFile`. https://docs.gitlab.com/17.0/ee/ci/runners/new_creation_workflow.html Done. The new configuration is /nix/store/nffq9ynzlrlx4m7phqgn621dcy3731xm-nixos-system-hut-25.11.20251130.8bb5646 ```

abonerib added 1 commit 2025-12-02 14:40:37 +01:00

linuxPackages.perf is now perf

CI / build:cross (pull_request) Successful in 23m50s

Details

CI / build:all (pull_request) Failing after 23m55s

Details

4fa4005056

abonerib referenced this pull request

2025-12-02 14:45:31 +01:00

Update nixpkgs to get proper gcc version #199

abonerib added this to the 25.11 Release milestone 2025-12-02 14:45:40 +01:00

abonerib added 1 commit 2025-12-02 14:48:42 +01:00

Enable papi when cross-compiling

CI / build:cross (pull_request) Successful in 15m45s

Details

CI / build:all (pull_request) Failing after 15m47s

Details

111fcc61d8

abonerib force-pushed upgrade/25.11 from 111fcc61d8 to 408b974433

2025-12-02 16:27:29 +01:00

Compare

abonerib force-pushed upgrade/25.11 from 408b974433 to 00a7122768

2025-12-02 17:53:23 +01:00

Compare

abonerib changed title from ~~WIP: nixpkgs 25.11~~ to Update to nixpkgs 25.11 (Xantusia)

2025-12-02 17:54:17 +01:00

abonerib requested review from rarias 2025-12-02 17:59:00 +01:00

abonerib force-pushed upgrade/25.11 from 00a7122768 to 1d3bda33a0

2025-12-03 10:15:20 +01:00

Compare

rarias commented

2025-12-10 13:28:43 +01:00

Thanks!, looks good. I would need to upgrade all machines to test it (including Fox due to SLURM), I would rather do it after Christmas unless we need some fixes before that. We have a custom AMD driver in Fox, could you also build the configuration for Fox to see if it still compiles?

CC: @varcila you were doing some experiments in Fox and this will upgrade the kernel (but not your development shell).

Thanks!, looks good. I would need to upgrade all machines to test it (including Fox due to SLURM), I would rather do it after Christmas unless we need some fixes before that. We have a custom AMD driver in Fox, could you also build the configuration for Fox to see if it still compiles? CC: @varcila you were doing some experiments in Fox and this will upgrade the kernel (but not your development shell).

varcila commented

2025-12-10 14:16:31 +01:00

Thanks for the copy. FYI, I have finished most of the batch of jobs I needed to execute this year, so I will most probably not use Fox until the second of January when I come back from holidays. Just to say that I have no preference for when the upgrade is done :)

abonerib added 1 commit 2025-12-10 14:34:58 +01:00

Remove conflicting definitions in amd-uprof-driver

CI / build:cross (pull_request) Successful in 8s

Details

CI / build:all (pull_request) Successful in 47m43s

Details

ee9af71da0

See: https://lkml.org/lkml/2025/4/9/1709

abonerib commented

2025-12-10 14:43:15 +01:00

We have a custom AMD driver in Fox, could you also build the configuration for Fox to see if it still compiles?

amd-uprof-driver is broken: https://jungle.bsc.es/p/abonerib/B8gcl28j.log

It is caused by the definitions of rdmsrq and wrmsrq defined in: inc/PwrProfAsm.h which now collide with the kernel's own: https://lkml.org/lkml/2025/4/9/1709 .

Doing a grep on the driver source it seems that they are not used anywhere, and since the amd-uprof comes from a binary blob I think it should be safe to remove them? @varcila

I have added a patch to comment them out and now fox config builds.

I would rather do it after Christmas unless we need some fixes before that.

No rush from my side, we can merge it once we come back from vacations.

> We have a custom AMD driver in Fox, could you also build the configuration for Fox to see if it still compiles? - `amd-uprof-driver` is broken: https://jungle.bsc.es/p/abonerib/B8gcl28j.log It is caused by the definitions of `rdmsrq` and `wrmsrq` defined in: `inc/PwrProfAsm.h` which now collide with the kernel's own: https://lkml.org/lkml/2025/4/9/1709 . Doing a grep on the driver source it seems that they are not used anywhere, and since the `amd-uprof` comes from a binary blob I think it should be safe to remove them? @varcila I have added a patch to comment them out and now fox config builds. > I would rather do it after Christmas unless we need some fixes before that. No rush from my side, we can merge it once we come back from vacations.

varcila commented

2025-12-10 14:56:55 +01:00

(...) I think it should be safe to remove them? @varcila

We can try, I think it makes sense to remove them.

> (...) I think it should be safe to remove them? @varcila We can try, I think it makes sense to remove them.

👍 1

rarias force-pushed upgrade/25.11 from ee9af71da0 to 14fe50fc2a

2026-01-07 16:48:19 +01:00

Compare

rarias added 1 commit 2026-01-07 17:50:01 +01:00

Fix infiniband interface name

CI / build:all (pull_request) Successful in 54m23s

Details

CI / build:cross (pull_request) Successful in 1h6m13s

Details

7686a75fd5

rarias commented

2026-01-07 17:58:02 +01:00

Fixed infiniband name in hut and switched to 25.11. I have also updated the nixpkgs commit so we pick the backported fixes. Everything else seems to be working fine so far.

I will propagate the upgrade to the rest of machines in the following days.

Fixed infiniband name in hut and switched to 25.11. I have also updated the nixpkgs commit so we pick the backported fixes. Everything else seems to be working fine so far. I will propagate the upgrade to the rest of machines in the following days.

rarias commented

2026-01-08 14:59:31 +01:00

Upgraded bay and lake2 (ceph storage). After rebooting lake2 three (of four) NVME disks are missing:

lake2% ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1

lake2% sudo dmesg | grep nvme
[    8.123120] nvme nvme0: pci function 0000:81:00.0
[    8.129975] nvme nvme0: 31/0/0 default/read/poll queues
[   16.436669] nvme nvme0: using unchecked data buffer

Let's see if rebooting it fixes it.

Upgraded bay and lake2 (ceph storage). After rebooting lake2 three (of four) NVME disks are missing: ``` lake2% ls /dev/nvme* /dev/nvme0 /dev/nvme0n1 lake2% sudo dmesg | grep nvme [ 8.123120] nvme nvme0: pci function 0000:81:00.0 [ 8.129975] nvme nvme0: 31/0/0 default/read/poll queues [ 16.436669] nvme nvme0: using unchecked data buffer ``` Let's see if rebooting it fixes it.

rarias commented

2026-01-08 15:16:17 +01:00

They are back:

lake2% ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1  /dev/nvme1  /dev/nvme1n1  /dev/nvme2  /dev/nvme2n1  /dev/nvme3  /dev/nvme3n1

lake2% sudo dmesg | grep nvme
[    8.128416] nvme nvme0: pci function 0000:83:00.0
[    8.128615] nvme nvme1: pci function 0000:84:00.0
[    8.128791] nvme nvme2: pci function 0000:85:00.0
[    8.128968] nvme nvme3: pci function 0000:86:00.0
[    8.136478] nvme nvme3: 31/0/0 default/read/poll queues
[    8.136522] nvme nvme2: 31/0/0 default/read/poll queues
[    8.143575] nvme nvme0: 31/0/0 default/read/poll queues
[    8.147813] nvme nvme1: 31/0/0 default/read/poll queues
[   16.434508] nvme nvme3: using unchecked data buffer

Something must be going on with the BIOS / BMC boot as the PCI address has changed for the nvme0 disk. I don't think is related with the upgrade. Ceph is fine and recovering now:

lake2% sudo ceph -s
  cluster:
    id:     9c8d06e0-485f-4aaf-b16b-06d6daf1232b
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum bay (age 26m)
    mgr: bay(active, since 26m)
    mds: 1/1 daemons up, 1 standby
    osd: 8 osds: 8 up (since 3m), 8 in (since 3m); 37 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 545 pgs
    objects: 1.25M objects, 1.4 TiB
    usage:   4.3 TiB used, 4.5 TiB / 8.7 TiB avail
    pgs:     111627/3750123 objects misplaced (2.977%)
             516 active+clean
             22  active+remapped+backfill_wait
             7   active+remapped+backfilling

  io:
    recovery: 307 MiB/s, 223 objects/s

They are back: ``` lake2% ls /dev/nvme* /dev/nvme0 /dev/nvme0n1 /dev/nvme1 /dev/nvme1n1 /dev/nvme2 /dev/nvme2n1 /dev/nvme3 /dev/nvme3n1 lake2% sudo dmesg | grep nvme [ 8.128416] nvme nvme0: pci function 0000:83:00.0 [ 8.128615] nvme nvme1: pci function 0000:84:00.0 [ 8.128791] nvme nvme2: pci function 0000:85:00.0 [ 8.128968] nvme nvme3: pci function 0000:86:00.0 [ 8.136478] nvme nvme3: 31/0/0 default/read/poll queues [ 8.136522] nvme nvme2: 31/0/0 default/read/poll queues [ 8.143575] nvme nvme0: 31/0/0 default/read/poll queues [ 8.147813] nvme nvme1: 31/0/0 default/read/poll queues [ 16.434508] nvme nvme3: using unchecked data buffer ``` Something must be going on with the BIOS / BMC boot as the PCI address has changed for the nvme0 disk. I don't think is related with the upgrade. Ceph is fine and recovering now: ``` lake2% sudo ceph -s cluster: id: 9c8d06e0-485f-4aaf-b16b-06d6daf1232b health: HEALTH_OK services: mon: 1 daemons, quorum bay (age 26m) mgr: bay(active, since 26m) mds: 1/1 daemons up, 1 standby osd: 8 osds: 8 up (since 3m), 8 in (since 3m); 37 remapped pgs data: volumes: 1/1 healthy pools: 4 pools, 545 pgs objects: 1.25M objects, 1.4 TiB usage: 4.3 TiB used, 4.5 TiB / 8.7 TiB avail pgs: 111627/3750123 objects misplaced (2.977%) 516 active+clean 22 active+remapped+backfill_wait 7 active+remapped+backfilling io: recovery: 307 MiB/s, 223 objects/s ```

rarias force-pushed upgrade/25.11 from 7686a75fd5 to 4a6e36c7e9

2026-01-08 15:17:01 +01:00

Compare

varcila commented

2026-01-08 15:42:19 +01:00

@rarias Can we delay the upgrade of fox until 17 of January? One day after the wamta deadline, turns out getting results never ends

rarias commented

2026-01-08 16:13:52 +01:00

@rarias Can we delay the upgrade of fox until 17 of January? One day after the wamta deadline, turns out getting results never ends

Sure, I will leave apex, fox, owl1 and owl2 as-is until after the 17th, as they all need SLURM to be upgraded at the same time.

Raccoon and tent (including this Gitea service) have been just upgraded, I haven't seen anything broken yet.

> @rarias Can we delay the upgrade of fox until 17 of January? One day after the wamta deadline, turns out getting results never ends Sure, I will leave apex, fox, owl1 and owl2 as-is until after the 17th, as they all need SLURM to be upgraded at the same time. Raccoon and tent (including this Gitea service) have been just upgraded, I haven't seen anything broken yet.

🚀 2

rarias added 1 commit 2026-01-08 17:44:00 +01:00

Remove unneeded perf package from eudy

CI / build:cross (pull_request) Successful in 8s

Details

CI / build:all (pull_request) Successful in 16s

Details

d0e944d05c

It is already included in the base list of packages, which is now only
"perf" and doesn't depend on the kernel version.

rarias added 1 commit 2026-01-09 18:04:10 +01:00

Fix gitea user to allow sending email

CI / build:cross (pull_request) Successful in 8s

Details

CI / build:all (pull_request) Successful in 16s

Details

fcfee6c674

In order to send email, the gitea user needs to be in the mail-robot
group.

Fixes: #220

rarias commented

2026-01-20 11:32:23 +01:00

Fox, owl1, owl2 and apex upgraded, no problems so far.

rarias force-pushed upgrade/25.11 from fcfee6c674 to 2577f6344b

2026-01-20 11:49:43 +01:00

Compare

rarias approved these changes 2026-01-20 12:31:32 +01:00

rarias force-pushed upgrade/25.11 from 2577f6344b to dda6a66782

2026-01-20 13:48:40 +01:00

Compare

rarias manually merged commit dda6a66782 into master

2026-01-20 13:51:15 +01:00

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#218