Shared nix store across compute nodes #23

Open
opened 2023-06-20 14:58:04 +02:00 by rarias · 13 comments
rarias commented 2023-06-20 14:58:04 +02:00 (Migrated from pm.bsc.es)

We should configure the compute nodes to directly access the nix store at hut, so we don't have to copy the closures manually when launching SLURM jobs.

A possible option is to configure an overlay file system where we mount the ro nix store by NFS or other remote FS and then set an overlay to always have the kernel to boot:

https://discourse.nixos.org/t/sharing-nix-store-between-containers/9733/16
8346dc04b3/nixos/modules/installer/netboot/netboot.nix (L39-L70)
8346dc04b3/nixos/modules/installer/netboot/netboot.nix (L106-L108)

We should configure the compute nodes to directly access the nix store at hut, so we don't have to copy the closures manually when launching SLURM jobs. A possible option is to configure an overlay file system where we mount the ro nix store by NFS or other remote FS and then set an overlay to always have the kernel to boot: https://discourse.nixos.org/t/sharing-nix-store-between-containers/9733/16 https://github.com/NixOS/nixpkgs/blob/8346dc04b31bc4ade35bd15a72e6cc40c8ac2f73/nixos/modules/installer/netboot/netboot.nix#L39-L70 https://github.com/NixOS/nixpkgs/blob/8346dc04b31bc4ade35bd15a72e6cc40c8ac2f73/nixos/modules/installer/netboot/netboot.nix#L106-L108
rarias commented 2023-09-04 14:16:06 +02:00 (Migrated from pm.bsc.es)

We may be able to use the new Ceph filesystem for that with an overlay.

We may be able to use the new Ceph filesystem for that with an overlay.
rarias commented 2023-09-06 17:41:51 +02:00 (Migrated from pm.bsc.es)

So, the ceph FS may not even be needed for now, as we can expose the read-only hut store via NFS (already being exported) and make an overlay in compute nodes using the disk nix store as lower.

This needs to work when hut is down, otherwise we won't be able to boot. The rw store will contain the bootstrap to boot the system.

The current problem is that the overlay fs refuses to mount as upper a directory that is read only, like /nix/store (due to systemd).

So, the ceph FS may not even be needed for now, as we can expose the read-only hut store via NFS (already being exported) and make an overlay in compute nodes using the disk nix store as lower. This needs to work when hut is down, otherwise we won't be able to boot. The rw store will contain the bootstrap to boot the system. The current problem is that the overlay fs refuses to mount as upper a directory that is read only, like /nix/store (due to systemd).
rarias commented 2023-09-07 12:49:08 +02:00 (Migrated from pm.bsc.es)
May be interesting: https://talks.nixcon.org/nixcon-2023/talk/GXW3EX/
rarias commented 2023-09-18 10:12:54 +02:00 (Migrated from pm.bsc.es)

I was able to mount the hut nix store in owl1 by doing the following procedure:

  • Disable the ro mount point of /nix/store by issuing: mount --bind -o remount,rw /nix/store
  • Mount the hut nix store via NFS: mount -o ro hut:/nix /mnt/nix-hut
  • Then mount the overlay over the store itself using the NFS store as lower dir: mount -t overlay overlay -o lowerdir=/mnt/nix-hut,upperdir=/nix,workdir=/mnt/nix-work /nix

Mounting only /nix/store fails, I suspect is because they have to be in the same filesystem but is a bind mount. So I mounted /nix directly.

I was able to mount the hut nix store in owl1 by doing the following procedure: - Disable the ro mount point of /nix/store by issuing: `mount --bind -o remount,rw /nix/store` - Mount the hut nix store via NFS: `mount -o ro hut:/nix /mnt/nix-hut` - Then mount the overlay over the store itself using the NFS store as lower dir: `mount -t overlay overlay -o lowerdir=/mnt/nix-hut,upperdir=/nix,workdir=/mnt/nix-work /nix` Mounting only `/nix/store` fails, I suspect is because they have to be in the same filesystem but is a bind mount. So I mounted /nix directly.
rarias commented 2023-09-18 12:01:33 +02:00 (Migrated from pm.bsc.es)

Installing packages is not working fine, I broke the nix database:

error: executing SQLite statement 'pragma synchronous = normal': unable to open database file, unable to open database file (in '/nix/var/nix/db/db.sqlite')

I'm thinking that we don't need to "see" the derivation of the mount point from nix-daemon. So we can specify a private mount for nix-daemon, and the rest of the system sees /mnt/hut-nix-store + /nix/store, both in read-only.

This will allow jobs from the login to run from a shell with all the software loaded. However, nix shell won't work from the compute nodes, as nix-daemon won't see the derivations.

Installing packages is not working fine, I broke the nix database: ``` error: executing SQLite statement 'pragma synchronous = normal': unable to open database file, unable to open database file (in '/nix/var/nix/db/db.sqlite') ``` I'm thinking that we don't need to "see" the derivation of the mount point from nix-daemon. So we can specify a private mount for nix-daemon, and the rest of the system sees /mnt/hut-nix-store + /nix/store, both in read-only. This will allow jobs from the login to run from a shell with all the software loaded. However, nix shell won't work from the compute nodes, as nix-daemon won't see the derivations.
rarias commented 2023-09-18 12:34:23 +02:00 (Migrated from pm.bsc.es)

Using a ro store doesn't work either, the daemon seems to be unable to write to it:

owl1% findmnt /nix/store
TARGET     SOURCE  FSTYPE  OPTIONS
/nix/store overlay overlay ro,relatime,lowerdir=/nix/store:/mnt/hut-nix-store,redirect_dir=on

owl1% nix shell nixpkgs#cowsay
error:
       … while evaluating a branch condition

         at /nix/store/wl5m5xfayd69ycyspzyd4rilfgl6wmh0-source/pkgs/stdenv/booter.nix:99:7:

           98|     thisStage =
           99|       if args.__raw or false
             |       ^
          100|       then args'

       … in the right operand of the update (//) operator

         at /nix/store/wl5m5xfayd69ycyspzyd4rilfgl6wmh0-source/pkgs/stdenv/booter.nix:84:7:

           83|       { allowCustomOverrides = index == 1; }
           84|       // (stageFun prevStage))
             |       ^
           85|     (lib.lists.reverseList stageFuns);

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: opening lock file '/nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh.lock': Read-only file system

Let's try with rw overlay, using a work directory.

Using a ro store doesn't work either, the daemon seems to be unable to write to it: ``` owl1% findmnt /nix/store TARGET SOURCE FSTYPE OPTIONS /nix/store overlay overlay ro,relatime,lowerdir=/nix/store:/mnt/hut-nix-store,redirect_dir=on owl1% nix shell nixpkgs#cowsay error: … while evaluating a branch condition at /nix/store/wl5m5xfayd69ycyspzyd4rilfgl6wmh0-source/pkgs/stdenv/booter.nix:99:7: 98| thisStage = 99| if args.__raw or false | ^ 100| then args' … in the right operand of the update (//) operator at /nix/store/wl5m5xfayd69ycyspzyd4rilfgl6wmh0-source/pkgs/stdenv/booter.nix:84:7: 83| { allowCustomOverrides = index == 1; } 84| // (stageFun prevStage)) | ^ 85| (lib.lists.reverseList stageFuns); (stack trace truncated; use '--show-trace' to show the full trace) error: opening lock file '/nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh.lock': Read-only file system ``` Let's try with rw overlay, using a work directory.
rarias commented 2023-09-18 12:54:45 +02:00 (Migrated from pm.bsc.es)

Using a rw overlay seems to work:

owl1% findmnt /nix/store
TARGET     SOURCE  FSTYPE  OPTIONS
/nix/store overlay overlay rw,relatime,lowerdir=/mnt/hut-nix-store,upperdir=/nix/store,workdir=/mnt/nix-work

owl1% nix shell nixpkgs#cowsay

owl1% cowsay it works
 __________
< it works >
 ----------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

The derivations still need to be built, so it wouldn't be very practical if users run nix shell/develop from the compute node instead of running it from hut.

Using a rw overlay seems to work: ``` owl1% findmnt /nix/store TARGET SOURCE FSTYPE OPTIONS /nix/store overlay overlay rw,relatime,lowerdir=/mnt/hut-nix-store,upperdir=/nix/store,workdir=/mnt/nix-work owl1% nix shell nixpkgs#cowsay owl1% cowsay it works __________ < it works > ---------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || ``` The derivations still need to be built, so it wouldn't be very practical if users run `nix shell/develop` from the compute node instead of running it from hut.
rarias commented 2023-09-18 14:04:26 +02:00 (Migrated from pm.bsc.es)

Configuring the overlay with fileSystems."/nix/store" causes the stage1 to attempt to mount the filesystem during boot, even with the neededForBoot option set to false, see: 17a46d09ac/nixos/lib/utils.nix (L14)

A simple hack is to add a double slash, so the check fails but the fs can be mounted anyway:

  fileSystems."/nix//store" = {
    device = "overlay";
    fsType = "overlay";
    options = [ "lowerdir=/mnt/hut-nix-store,upperdir=/nix/store,workdir=/mnt/nix-work" ];
    depends = [ "/nix/store" "/mnt/hut-nix-store" "/mnt/nix-work" ];
  };
Configuring the overlay with `fileSystems."/nix/store"` causes the stage1 to attempt to mount the filesystem during boot, even with the `neededForBoot` option set to false, see: https://github.com/NixOS/nixpkgs/blob/17a46d09ac123d0da3a26855bf3af7db01f9c751/nixos/lib/utils.nix#L14 A simple hack is to add a double slash, so the check fails but the fs can be mounted anyway: ```nix fileSystems."/nix//store" = { device = "overlay"; fsType = "overlay"; options = [ "lowerdir=/mnt/hut-nix-store,upperdir=/nix/store,workdir=/mnt/nix-work" ]; depends = [ "/nix/store" "/mnt/hut-nix-store" "/mnt/nix-work" ]; }; ```
rarias commented 2023-09-18 14:14:50 +02:00 (Migrated from pm.bsc.es)

mentioned in merge request !22

mentioned in merge request !22
rarias commented 2023-09-18 14:31:11 +02:00 (Migrated from pm.bsc.es)

It seems to be attempting to mount it in the incorrect order:

[    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb000020 (or later)
[    0.000039] x86/cpu: VMX (outside TXT) disabled by BIOS
[    6.777619] pstore: Unknown compression: deflate

<<< NixOS Stage 1 >>>

loading module dm_mod...
running udev...
Starting systemd-udevd version 253.6
kbd_mode: KDSKBMiate ioctl for device
starting device mapper and LVM...
checking /dev/disk/by-label/nixos...
fsck (busybox 1.36.1)
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-label/nixos
nixos: clean, 722704/14655488 files, 4099003/58607360 blocks
mounting /dev/disk/by-label/nixos on /...

<<< NixOS Stage 2 >>>

running activation script...
[agenix] creating new generation in /run/agenix.d/1
[agenix] decrypting secrets...
decrypting '/nix/store/573kahhid1p3f0nxd3ffqdmn2n1xcrq6-ceph-user.age' to '/run/agenix.d/1/cephUser'...
decrypting '/nix/store/f3rzdnqg6kkshpr8kh5fx8vd13zjb3aj-munge-key.age' to '/run/agenix.d/1/mungeKey'...
[agenix] symlinking new secrets to /run/agenix (generation 1)...
[agenix] chowning...
setting up /etc...
starting systemd...

Welcome to NixOS 23.11 (Tapir)!

[  OK  ] Created slice Slice /system/getty.
[  OK  ] Created slice Slice /system/modprobe.
[  OK  ] Created slice Slice /system/serial-getty.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Containers.
[  OK  ] Reached target Path Units.
[  OK  ] Reached target Slice Units.
[  OK  ] Listening on RPCbind Server Activation Socket.
[  OK  ] Reached target RPC Port Mapper.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on Userspace Out-Of-Memory (OOM) Killer Socket.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on udev Kernel Socket.
         Mounting Huge Pages File System...
         Mounting POSIX Message Queue File System...
         Mounting /nix/store...
         Mounting Kernel Debug File System...
[    9.784966] overlayfs: failed to resolve '/mnt/hut-nix-store': -2
         Mounting /sys/kernel/tracing...
         Starting Create List of Static Device Nodes...
         Starting Load Kernel Module configfs...
         Starting Load Kernel Module drm...
         Starting Load Kernel Module efi_pstore...
         Starting Load Kernel Module fuse...
         Starting mount-pstore.service...
         Starting Journal Service...
         Starting Load Kernel Modules...
         Starting Remount Root and Kernel File Systems...
         Starting Coldplug All udev Devices...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[    9.941191] systemd[1]: Failed to mount /nix/store.
[FAILED] Failed to mount /nix/store.
See 'systemctl status nix-store.mount' for details.
[DEPEND] Dependency failed for Local File Systems.
[  OK  ] Started Journal Service.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted /sys/kernel/tracing.
[  OK  ] Finished Create List of Static Device Nodes.
[  OK  ] Finished Load Kernel Module configfs.
[   10.108110] kvm_intel: VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU 41
[  OK  ] Finished Load Kernel Module drm.
[  OK  ] Finished Load Kernel Module efi_pstore.
[  OK  ] Finished Load Kernel Module fuse.
[  OK  ] Finished Remount Root and Kernel File Systems.
[  OK  ] Finished Coldplug All udev Devices.
[  OK  ] Finished mount-pstore.service.
[  OK  ] Stopped Dispatch Password …ts to Console Directory Watch.
[  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
[  OK  ] Reached target Timer Units.
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Reached target Login Prompts.
         Starting Flush Journal to Persistent Storage...
         Starting Load/Save OS Random Seed...
[  OK  ] Reached target Socket Units.
         Mounting FUSE Control File System...
         Mounting Kernel Configuration File System...
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Load Kernel Modules.
[  OK  ] Finished Flush Journal to Persistent Storage.
[  OK  ] Finished Load/Save OS Random Seed.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Mounted Kernel Configuration File System.
[  OK  ] Finished Create Static Device Nodes in /dev.
[  OK  ] Reached target Preparation for Local File Systems.
         Starting Firewall...
         Starting Apply Kernel Variables...
         Starting Create Volatile Files and Directories...
         Starting Rule-based Manage…for Device Events and Files...
[  OK  ] Finished Apply Kernel Variables.
[  OK  ] Finished Create Volatile Files and Directories.
[  OK  ] Started Rule-based Manager for Device Events and Files.
         Mounting RPC Pipe File System...
         Starting RPC Bind...
         Starting Userspace Out-Of-Memory (OOM) Killer...
         Starting Record System Boot/Shutdown in UTMP...
[  OK  ] Finished Firewall.
[  OK  ] Started RPC Bind.
[  OK  ] Started Userspace Out-Of-Memory (OOM) Killer.
[  OK  ] Mounted RPC Pipe File System.
[  OK  ] Finished Record System Boot/Shutdown in UTMP.
[  OK  ] Found device INTEL_SSDSC2BB240G7 swap.
[  OK  ] Reached target Preparation for Network.
[  OK  ] Reached target All Network Interfaces (deprecated).
[  OK  ] Reached target Network.
[  OK  ] Reached target Network is Online.
[  OK  ] Reached target rpc_pipefs.target.
         Activating swap /dev/disk/by-label/swap...
[  OK  ] Reached target NFS client services.
[  OK  ] Reached target Preparation for Remote File Systems.
         Mounting /ceph...
         Mounting /home...
         Mounting /mnt/hut-nix-store...
         Starting Notify NFS peers of a restart...
[  OK  ] Activated swap /dev/disk/by-label/swap.
[  OK  ] Started Notify NFS peers of a restart.
[  OK  ] Reached target Swaps.
[  OK  ] Stopped target Emergency Mode.
[   11.214521] libceph: connect (1)10.0.40.40:6789 error -101
         Mountin[   11.221381] libceph: connect (1)10.0.40.40:6789 error -101
g /nix/store...
         Starting Load Kernel Module efi_pstore...
[  OK  ] Stopped Emergency Shell.
[  OK  ] Mounted /nix/store.
[  OK  ] Finished Load Kernel Module efi_pstore.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target System Initialization.
         Starting Name Service Cache Daemon (nsncd)...
[  OK  ] Started Name Service Cache Daemon (nsncd).
[  OK  ] Reached target Host and Network Name Lookups.
[  OK  ] Reached target User and Group Name Lookups.
         Starting NFS status monitor for NFSv2/3 locking....
[FAILED] Failed to mount /mnt/hut-nix-store.
See 'systemctl status "mnt-hut\\x2dnix\\x2dstore.mount"' for details.
[DEPEND] Dependency failed for Remote File Systems.
[  OK  ] Started NFS status [   11.481081] libceph: connect (1)10.0.40.40:6789 error -101
monitor for NFSv2/3 locking..
[FAILED] Failed to mount /home.
See 'systemctl status home.mount' for details.
[   11.994036] libceph: connect (1)10.0.40.40:6789 error -101
[   13.657053] libceph: connect (1)10.0.40.40:6789 error -101
[   14.553052] libceph: connect (1)10.0.40.40:6789 error -101
[   14.817091] libceph: connect (1)10.0.40.40:6789 error -101
[   15.329039] libceph: connect (1)10.0.40.40:6789 error -101
[   16.601056] libceph: connect (1)10.0.40.40:6789 error -101
[**    ] A start job is running for /ceph (7s / 1min 31s)
[   17.561064] libceph: connect (1)10.0.40.40:6789 error -101
[***   ] A start job is running for /ceph (8s / 1min 31s)
[  *** ] A start job is running for /ceph (9s / 1min 31s)
...
[   ***] A start job is running for /ceph (1min 2s / 1min 31s)
[     *] A start job is running for /ceph (1min 3s / 1min 31s)
[   ***] A start job is running for /ceph (1min 4s / 1min 31s)
[FAILED] Failed to mount /ceph.
See 'systemctl status ceph.mount' for details.
It seems to be attempting to mount it in the incorrect order: ``` [ 0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb000020 (or later) [ 0.000039] x86/cpu: VMX (outside TXT) disabled by BIOS [ 6.777619] pstore: Unknown compression: deflate <<< NixOS Stage 1 >>> loading module dm_mod... running udev... Starting systemd-udevd version 253.6 kbd_mode: KDSKBMiate ioctl for device starting device mapper and LVM... checking /dev/disk/by-label/nixos... fsck (busybox 1.36.1) [fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-label/nixos nixos: clean, 722704/14655488 files, 4099003/58607360 blocks mounting /dev/disk/by-label/nixos on /... <<< NixOS Stage 2 >>> running activation script... [agenix] creating new generation in /run/agenix.d/1 [agenix] decrypting secrets... decrypting '/nix/store/573kahhid1p3f0nxd3ffqdmn2n1xcrq6-ceph-user.age' to '/run/agenix.d/1/cephUser'... decrypting '/nix/store/f3rzdnqg6kkshpr8kh5fx8vd13zjb3aj-munge-key.age' to '/run/agenix.d/1/mungeKey'... [agenix] symlinking new secrets to /run/agenix (generation 1)... [agenix] chowning... setting up /etc... starting systemd... Welcome to NixOS 23.11 (Tapir)! [ OK ] Created slice Slice /system/getty. [ OK ] Created slice Slice /system/modprobe. [ OK ] Created slice Slice /system/serial-getty. [ OK ] Created slice User and Session Slice. [ OK ] Started Dispatch Password …ts to Console Directory Watch. [ OK ] Started Forward Password R…uests to Wall Directory Watch. [ OK ] Reached target Local Encrypted Volumes. [ OK ] Reached target Containers. [ OK ] Reached target Path Units. [ OK ] Reached target Slice Units. [ OK ] Listening on RPCbind Server Activation Socket. [ OK ] Reached target RPC Port Mapper. [ OK ] Listening on Process Core Dump Socket. [ OK ] Listening on Journal Socket (/dev/log). [ OK ] Listening on Journal Socket. [ OK ] Listening on Userspace Out-Of-Memory (OOM) Killer Socket. [ OK ] Listening on udev Control Socket. [ OK ] Listening on udev Kernel Socket. Mounting Huge Pages File System... Mounting POSIX Message Queue File System... Mounting /nix/store... Mounting Kernel Debug File System... [ 9.784966] overlayfs: failed to resolve '/mnt/hut-nix-store': -2 Mounting /sys/kernel/tracing... Starting Create List of Static Device Nodes... Starting Load Kernel Module configfs... Starting Load Kernel Module drm... Starting Load Kernel Module efi_pstore... Starting Load Kernel Module fuse... Starting mount-pstore.service... Starting Journal Service... Starting Load Kernel Modules... Starting Remount Root and Kernel File Systems... Starting Coldplug All udev Devices... [ OK ] Mounted Huge Pages File System. [ OK ] Mounted POSIX Message Queue File System. [ 9.941191] systemd[1]: Failed to mount /nix/store. [FAILED] Failed to mount /nix/store. See 'systemctl status nix-store.mount' for details. [DEPEND] Dependency failed for Local File Systems. [ OK ] Started Journal Service. [ OK ] Mounted Kernel Debug File System. [ OK ] Mounted /sys/kernel/tracing. [ OK ] Finished Create List of Static Device Nodes. [ OK ] Finished Load Kernel Module configfs. [ 10.108110] kvm_intel: VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU 41 [ OK ] Finished Load Kernel Module drm. [ OK ] Finished Load Kernel Module efi_pstore. [ OK ] Finished Load Kernel Module fuse. [ OK ] Finished Remount Root and Kernel File Systems. [ OK ] Finished Coldplug All udev Devices. [ OK ] Finished mount-pstore.service. [ OK ] Stopped Dispatch Password …ts to Console Directory Watch. [ OK ] Stopped Forward Password R…uests to Wall Directory Watch. [ OK ] Reached target Timer Units. [ OK ] Reached target System Time Synchronized. [ OK ] Reached target Login Prompts. Starting Flush Journal to Persistent Storage... Starting Load/Save OS Random Seed... [ OK ] Reached target Socket Units. Mounting FUSE Control File System... Mounting Kernel Configuration File System... [ OK ] Started Emergency Shell. [ OK ] Reached target Emergency Mode. Starting Create Static Device Nodes in /dev... [ OK ] Finished Load Kernel Modules. [ OK ] Finished Flush Journal to Persistent Storage. [ OK ] Finished Load/Save OS Random Seed. [ OK ] Mounted FUSE Control File System. [ OK ] Mounted Kernel Configuration File System. [ OK ] Finished Create Static Device Nodes in /dev. [ OK ] Reached target Preparation for Local File Systems. Starting Firewall... Starting Apply Kernel Variables... Starting Create Volatile Files and Directories... Starting Rule-based Manage…for Device Events and Files... [ OK ] Finished Apply Kernel Variables. [ OK ] Finished Create Volatile Files and Directories. [ OK ] Started Rule-based Manager for Device Events and Files. Mounting RPC Pipe File System... Starting RPC Bind... Starting Userspace Out-Of-Memory (OOM) Killer... Starting Record System Boot/Shutdown in UTMP... [ OK ] Finished Firewall. [ OK ] Started RPC Bind. [ OK ] Started Userspace Out-Of-Memory (OOM) Killer. [ OK ] Mounted RPC Pipe File System. [ OK ] Finished Record System Boot/Shutdown in UTMP. [ OK ] Found device INTEL_SSDSC2BB240G7 swap. [ OK ] Reached target Preparation for Network. [ OK ] Reached target All Network Interfaces (deprecated). [ OK ] Reached target Network. [ OK ] Reached target Network is Online. [ OK ] Reached target rpc_pipefs.target. Activating swap /dev/disk/by-label/swap... [ OK ] Reached target NFS client services. [ OK ] Reached target Preparation for Remote File Systems. Mounting /ceph... Mounting /home... Mounting /mnt/hut-nix-store... Starting Notify NFS peers of a restart... [ OK ] Activated swap /dev/disk/by-label/swap. [ OK ] Started Notify NFS peers of a restart. [ OK ] Reached target Swaps. [ OK ] Stopped target Emergency Mode. [ 11.214521] libceph: connect (1)10.0.40.40:6789 error -101 Mountin[ 11.221381] libceph: connect (1)10.0.40.40:6789 error -101 g /nix/store... Starting Load Kernel Module efi_pstore... [ OK ] Stopped Emergency Shell. [ OK ] Mounted /nix/store. [ OK ] Finished Load Kernel Module efi_pstore. [ OK ] Started Dispatch Password …ts to Console Directory Watch. [ OK ] Reached target Local File Systems. [ OK ] Reached target System Initialization. Starting Name Service Cache Daemon (nsncd)... [ OK ] Started Name Service Cache Daemon (nsncd). [ OK ] Reached target Host and Network Name Lookups. [ OK ] Reached target User and Group Name Lookups. Starting NFS status monitor for NFSv2/3 locking.... [FAILED] Failed to mount /mnt/hut-nix-store. See 'systemctl status "mnt-hut\\x2dnix\\x2dstore.mount"' for details. [DEPEND] Dependency failed for Remote File Systems. [ OK ] Started NFS status [ 11.481081] libceph: connect (1)10.0.40.40:6789 error -101 monitor for NFSv2/3 locking.. [FAILED] Failed to mount /home. See 'systemctl status home.mount' for details. [ 11.994036] libceph: connect (1)10.0.40.40:6789 error -101 [ 13.657053] libceph: connect (1)10.0.40.40:6789 error -101 [ 14.553052] libceph: connect (1)10.0.40.40:6789 error -101 [ 14.817091] libceph: connect (1)10.0.40.40:6789 error -101 [ 15.329039] libceph: connect (1)10.0.40.40:6789 error -101 [ 16.601056] libceph: connect (1)10.0.40.40:6789 error -101 [** ] A start job is running for /ceph (7s / 1min 31s) [ 17.561064] libceph: connect (1)10.0.40.40:6789 error -101 [*** ] A start job is running for /ceph (8s / 1min 31s) [ *** ] A start job is running for /ceph (9s / 1min 31s) ... [ ***] A start job is running for /ceph (1min 2s / 1min 31s) [ *] A start job is running for /ceph (1min 3s / 1min 31s) [ ***] A start job is running for /ceph (1min 4s / 1min 31s) [FAILED] Failed to mount /ceph. See 'systemctl status ceph.mount' for details. ```
rarias commented 2023-09-18 14:36:21 +02:00 (Migrated from pm.bsc.es)

The mount unit doesn't seem to have the proper Requires dependency. Maybe we should create a mount systemd unit instead of relying on NixOS fileSystems.

The mount unit doesn't seem to have the proper Requires dependency. Maybe we should create a mount systemd unit instead of relying on NixOS fileSystems.
rarias commented 2023-09-18 16:30:12 +02:00 (Migrated from pm.bsc.es)

After fixing some boot dependency problems, there is still a cycle in the order of the units at boot:

[   10.440234] systemd[1]: network-addresses-eno1.service: Job sockets.target/start deleted to break ordering cycle starting with network-addresses-eno1.service/start
[ SKIP ] Ordering cycle found, skipping Socket Units

...

Sep 18 16:14:43 owl2 systemd-fstab-generator[1350]: Checking was requested for "user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/", but it is not a device.
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found ordering cycle on basic.target/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on sockets.target/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on nix-daemon.socket/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on nix-store.mount/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on mnt-hut\x2dnix\x2dstore.mount/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on network-online.target/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on network.target/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on network-addresses-eno1.service/start
Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Job sockets.target/start deleted to break ordering cycle starting with network-addresses-eno1.service/start
After fixing some boot dependency problems, there is still a cycle in the order of the units at boot: ``` [ 10.440234] systemd[1]: network-addresses-eno1.service: Job sockets.target/start deleted to break ordering cycle starting with network-addresses-eno1.service/start [ SKIP ] Ordering cycle found, skipping Socket Units ... Sep 18 16:14:43 owl2 systemd-fstab-generator[1350]: Checking was requested for "user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/", but it is not a device. Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found ordering cycle on basic.target/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on sockets.target/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on nix-daemon.socket/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on nix-store.mount/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on mnt-hut\x2dnix\x2dstore.mount/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on network-online.target/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on network.target/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Found dependency on network-addresses-eno1.service/start Sep 18 16:14:43 owl2 systemd[1]: network-addresses-eno1.service: Job sockets.target/start deleted to break ordering cycle starting with network-addresses-eno1.service/start ```
rarias commented 2023-09-18 17:42:31 +02:00 (Migrated from pm.bsc.es)

Using an override file adds the mount point to the list, instead of replacing it:

owl2% sudo systemctl cat nix-daemon.socket
# /etc/systemd/system/nix-daemon.socket
[Unit]
Description=Nix Daemon Socket
Before=multi-user.target
RequiresMountsFor=/nix/store
ConditionPathIsReadWrite=/nix/var/nix/daemon-socket

[Socket]
ListenStream=/nix/var/nix/daemon-socket/socket

[Install]
WantedBy=sockets.target

# /nix/store/4anvwl7mg5bdlnjaxv95yy1s6a2xja0z-system-units/nix-daemon.socket.d/overrides.conf
[Unit]
RequiresMountsFor=/nix/var/nix/daemon-socket

[Socket]



owl2% sudo systemctl show nix-daemon.socket | grep RequiresMountsFor
RequiresMountsFor=/nix/store /nix/var/nix/daemon-socket/socket /nix/var/nix/daemon-socket
Using an override file adds the mount point to the list, instead of replacing it: ``` owl2% sudo systemctl cat nix-daemon.socket # /etc/systemd/system/nix-daemon.socket [Unit] Description=Nix Daemon Socket Before=multi-user.target RequiresMountsFor=/nix/store ConditionPathIsReadWrite=/nix/var/nix/daemon-socket [Socket] ListenStream=/nix/var/nix/daemon-socket/socket [Install] WantedBy=sockets.target # /nix/store/4anvwl7mg5bdlnjaxv95yy1s6a2xja0z-system-units/nix-daemon.socket.d/overrides.conf [Unit] RequiresMountsFor=/nix/var/nix/daemon-socket [Socket] owl2% sudo systemctl show nix-daemon.socket | grep RequiresMountsFor RequiresMountsFor=/nix/store /nix/var/nix/daemon-socket/socket /nix/var/nix/daemon-socket ```
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#23
No description provided.