Transition to a ceph nix store #42

Open
opened 2023-09-19 18:01:06 +02:00 by rarias · 12 comments
rarias commented 2023-09-19 18:01:06 +02:00 (Migrated from pm.bsc.es)

As discussed with Aleix and Vicenç, we would benefit from having the nix store directly placed in the ceph filesystem and let the compute nodes boot directly mounting it in /nix/store. This would solve the cache problems of the overlay FS as observed in #41 at the same time that prepares the path to export the nix store to other nodes (ejem, MN4/5). It also makes the room for the nix store larger and more robust (3 redundant copies).

The nodes can boot directly from the net via PXE, so we don't have to worry about their disk state (they are essentially stateless). However, we must ensure that they don't write into the nix database. We can achieve it by mounting the nix store as read only.

But we would need to be able to build some packages from inside the compute nodes (specially for debugging purposes) so they must be able to write to the store via the nix daemon of hut. This is probably doable as we already configured something similar for MN4.

Here is roughly the plan:

  • Ensure that nix build/develop/shell don't modify the store, but is all handled by the nix daemon (which will be on hut).
  • Determine how to mount /nix/store via ceph early in the initrd, so we can continue the boot
  • Also mount some paths for state/logs (maybe /var and some others)
  • Make a tunnel for the nix daemon socket
  • Configure the node to use the remote nix daemon
  • Fix references to /nix/var/nix/profiles/system to be per host.
  • Test that we can build packages locally from the node and be submitted to hut for build
  • Prepare the PXE boot instead of reading the kernel from disk
  • Switch the BIOS to boot via PXE
As discussed with Aleix and Vicenç, we would benefit from having the nix store directly placed in the ceph filesystem and let the compute nodes boot directly mounting it in /nix/store. This would solve the cache problems of the overlay FS as observed in #41 at the same time that prepares the path to export the nix store to other nodes (ejem, MN4/5). It also makes the room for the nix store larger and more robust (3 redundant copies). The nodes can boot directly from the net via PXE, so we don't have to worry about their disk state (they are essentially stateless). However, we must ensure that they **don't write into the nix database**. We can achieve it by mounting the nix store as read only. But we would need to be able to build some packages from inside the compute nodes (specially for debugging purposes) so they must be able to write to the store via the nix daemon of hut. This is probably doable as we already configured something similar for MN4. Here is roughly the plan: - [x] Ensure that nix build/develop/shell don't modify the store, but is all handled by the nix daemon (which will be on hut). - [ ] Determine how to mount /nix/store via ceph early in the initrd, so we can continue the boot - [ ] Also mount some paths for state/logs (maybe /var and some others) - [x] Make a tunnel for the nix daemon socket - [ ] Configure the node to use the remote nix daemon - [ ] Fix references to /nix/var/nix/profiles/system to be per host. - [x] Test that we can build packages locally from the node and be submitted to hut for build - [ ] Prepare the PXE boot instead of reading the kernel from disk - [ ] Switch the BIOS to boot via PXE
rarias commented 2023-09-19 18:01:07 +02:00 (Migrated from pm.bsc.es)

assigned to @rarias

assigned to @rarias
rarias commented 2023-09-20 09:54:27 +02:00 (Migrated from pm.bsc.es)

The nix store in ceph doesn't really need 3 redundant copies, as the data stored there can be easily recovered, so let's create another pool just for the nix store with just 2 copies.

The nix store in ceph doesn't really need 3 redundant copies, as the data stored there can be easily recovered, so let's create another pool just for the nix store with just 2 copies.
rarias commented 2023-09-20 11:19:29 +02:00 (Migrated from pm.bsc.es)

Using a remote store seems to allow building:

owl1% nix build --eval-store auto --store ssh-ng://hut nixpkgs#fortune -v
don't know how to build these paths:
  /nix/store/y93pnpx57i41s213mhs8vsbimphwqy1z-fortune-mod-3.20.0.drv
copying 7 paths...
copying path '/nix/store/6h31fhs0isp3ddw8yfdb22cvqlnld2ac-fortune-mod-3.20.0.tar.xz.drv' to 'ssh-ng://hut'...
copying path '/nix/store/kk1rj1d0b12id9d68rwv8kwpkvhydahz-recode-3.7.12.tar.gz.drv' to 'ssh-ng://hut'...
copying path '/nix/store/hqp92wja0gvz1qpq2gfwngkcmy9ncpkp-recode-3.7.12.drv' to 'ssh-ng://hut'...
copying path '/nix/store/iha2vrjyw8hcpp89nn6vb834vvlm6dg2-rinutils-0.10.2.tar.xz.drv' to 'ssh-ng://hut'...
copying path '/nix/store/j37gq16nparslzcx2pnxq4zdvwabwdnz-rinutils-0.10.2.drv' to 'ssh-ng://hut'...
copying path '/nix/store/x64krg9bjynaz9fb4v0ich7l8rz2cfhs-not-a-game.patch' to 'ssh-ng://hut'...
copying path '/nix/store/y93pnpx57i41s213mhs8vsbimphwqy1z-fortune-mod-3.20.0.drv' to 'ssh-ng://hut'...
copying path '/nix/store/gih4q0345mx1kw9snryiqnqn6i0l2pcy-recode-3.7.12' from 'https://cache.nixos.org'...
copying path '/nix/store/0dkc04dcw3qcbvq1cnn4jbm1dz4d4pj4-fortune-mod-3.20.0' from 'https://cache.nixos.org'...

But not using a shell:

owl1% nix shell --eval-store auto --store ssh-ng://hut nixpkgs#fortune -v
copying 0 paths...
error: store 'ssh-ng://hut' is not a local store so it does not support command execution
Using a remote store seems to allow building: ``` owl1% nix build --eval-store auto --store ssh-ng://hut nixpkgs#fortune -v don't know how to build these paths: /nix/store/y93pnpx57i41s213mhs8vsbimphwqy1z-fortune-mod-3.20.0.drv copying 7 paths... copying path '/nix/store/6h31fhs0isp3ddw8yfdb22cvqlnld2ac-fortune-mod-3.20.0.tar.xz.drv' to 'ssh-ng://hut'... copying path '/nix/store/kk1rj1d0b12id9d68rwv8kwpkvhydahz-recode-3.7.12.tar.gz.drv' to 'ssh-ng://hut'... copying path '/nix/store/hqp92wja0gvz1qpq2gfwngkcmy9ncpkp-recode-3.7.12.drv' to 'ssh-ng://hut'... copying path '/nix/store/iha2vrjyw8hcpp89nn6vb834vvlm6dg2-rinutils-0.10.2.tar.xz.drv' to 'ssh-ng://hut'... copying path '/nix/store/j37gq16nparslzcx2pnxq4zdvwabwdnz-rinutils-0.10.2.drv' to 'ssh-ng://hut'... copying path '/nix/store/x64krg9bjynaz9fb4v0ich7l8rz2cfhs-not-a-game.patch' to 'ssh-ng://hut'... copying path '/nix/store/y93pnpx57i41s213mhs8vsbimphwqy1z-fortune-mod-3.20.0.drv' to 'ssh-ng://hut'... copying path '/nix/store/gih4q0345mx1kw9snryiqnqn6i0l2pcy-recode-3.7.12' from 'https://cache.nixos.org'... copying path '/nix/store/0dkc04dcw3qcbvq1cnn4jbm1dz4d4pj4-fortune-mod-3.20.0' from 'https://cache.nixos.org'... ``` But not using a shell: ``` owl1% nix shell --eval-store auto --store ssh-ng://hut nixpkgs#fortune -v copying 0 paths... error: store 'ssh-ng://hut' is not a local store so it does not support command execution ```
rarias commented 2023-09-20 12:04:57 +02:00 (Migrated from pm.bsc.es)

Using this findmnt:

owl1% findmnt
TARGET                                   SOURCE                              FSTYPE     OPTIONS
/                                        /dev/disk/by-label/nixos            ext4       rw,relatime
...
├─/home                                  10.0.40.30:/home                    nfs        rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,proto=tcp,tim
├─/ceph                                  user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/
│                                                                            ceph       rw,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40
│ ├─/ceph/nixstore                       user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/[/nixstore]
│ │                                                                          ceph       ro,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40
│ │ └─/ceph/nixstore/nix/var/nix/daemon-socket
│ │                                      /dev/disk/by-label/nixos[/var/nix/daemon-socket]
│ │                                                                          ext4       rw,relatime
│ └─/ceph/nixstore/nix/var/nix/daemon-socket
│                                        /dev/disk/by-label/nixos[/var/nix/daemon-socket]
│                                                                            ext4       rw,relatime
└─/nix                                   user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/[/nixstore/nix]
                                                                             ceph       ro,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40
  └─/nix/var/nix/daemon-socket           /dev/disk/by-label/nixos[/var/nix/daemon-socket]
                                                                             ext4       rw,relatime

And the socat hack to allow access to the hut daemon I'm able to build and enter a shell. So far, so good.

The gcroots are created by the hut daemon, so they are only guaranteed to be respected if they point to a place of the shared filesystem, otherwise they will be destroyed.

Another problem is how to avoid the collision of the /nix/var/nix/profiles/system among nodes.

Using this findmnt: ``` owl1% findmnt TARGET SOURCE FSTYPE OPTIONS / /dev/disk/by-label/nixos ext4 rw,relatime ... ├─/home 10.0.40.30:/home nfs rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,proto=tcp,tim ├─/ceph user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/ │ ceph rw,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40 │ ├─/ceph/nixstore user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/[/nixstore] │ │ ceph ro,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40 │ │ └─/ceph/nixstore/nix/var/nix/daemon-socket │ │ /dev/disk/by-label/nixos[/var/nix/daemon-socket] │ │ ext4 rw,relatime │ └─/ceph/nixstore/nix/var/nix/daemon-socket │ /dev/disk/by-label/nixos[/var/nix/daemon-socket] │ ext4 rw,relatime └─/nix user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/[/nixstore/nix] ceph ro,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40 └─/nix/var/nix/daemon-socket /dev/disk/by-label/nixos[/var/nix/daemon-socket] ext4 rw,relatime ``` And the socat hack to allow access to the hut daemon I'm able to build and enter a shell. So far, so good. The gcroots are created by the hut daemon, so they are only guaranteed to be respected if they point to a place of the shared filesystem, otherwise they will be destroyed. Another problem is how to avoid the collision of the /nix/var/nix/profiles/system among nodes.
rarias commented 2023-09-20 15:31:10 +02:00 (Migrated from pm.bsc.es)

In order to keep multiple versions of old system profiles per node, I need to adjust the symlinks so they don't clash with each other, see pkgs/os-specific/linux/nixos-rebuild/default.nix.

In order to keep multiple versions of old system profiles per node, I need to adjust the symlinks so they don't clash with each other, see pkgs/os-specific/linux/nixos-rebuild/default.nix.
rarias commented 2023-09-20 15:31:40 +02:00 (Migrated from pm.bsc.es)

changed the description

changed the description
rarias commented 2023-09-20 15:33:13 +02:00 (Migrated from pm.bsc.es)

marked the checklist item Ensure that nix build/develop/shell don't modify the store, but is all handled by the nix daemon (which will be on hut). as completed

marked the checklist item **Ensure that nix build/develop/shell don't modify the store, but is all handled by the nix daemon (which will be on hut).** as completed
rarias commented 2023-09-20 15:33:23 +02:00 (Migrated from pm.bsc.es)

marked the checklist item Make a tunnel for the nix daemon socket as completed

marked the checklist item **Make a tunnel for the nix daemon socket** as completed
rarias commented 2023-09-20 15:33:36 +02:00 (Migrated from pm.bsc.es)

marked the checklist item Test that we can build packages locally from the node and be submitted to hut for build as completed

marked the checklist item **Test that we can build packages locally from the node and be submitted to hut for build** as completed
rarias commented 2023-09-20 16:04:39 +02:00 (Migrated from pm.bsc.es)

We will also need to patch the script that finds the grub entries: a39526b3ef/nixos/modules/system/boot/loader/grub/install-grub.pl (L601)

We can safely assume that the installation of the GRUB will be done in the same host that contains the disk in which the GRUB is installed. This is only for disk installation, not for PXE booting.

We will also need to patch the script that finds the grub entries: https://github.com/NixOS/nixpkgs/blob/a39526b3ef4488fdb98eabb0cef0985e671a2b5c/nixos/modules/system/boot/loader/grub/install-grub.pl#L601 We can safely assume that the installation of the GRUB will be done in the same host that contains the disk in which the GRUB is installed. This is only for disk installation, not for PXE booting.
rarias commented 2023-09-20 19:12:27 +02:00 (Migrated from pm.bsc.es)

If we netboot, all the state is stored in hut, so there is no need to do any nixos-rebuild ... --target-host anymore.

If we netboot, all the state is stored in hut, so there is no need to do any `nixos-rebuild ... --target-host` anymore.
rarias commented 2023-09-20 19:51:18 +02:00 (Migrated from pm.bsc.es)

mentioned in merge request !24

mentioned in merge request !24
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#42
No description provided.