Draft: Mount the nix store from hut in compute nodes #66

Closed
rarias wants to merge 11 commits from shared-nix-store into master
rarias commented 2023-09-18 14:14:46 +02:00 (Migrated from pm.bsc.es)

This MR solves one of the remaining pieces to have a working cluster with SLURM and Nix.

It mounts the nix store from hut into the compute nodes owl1 and owl2 by using an overlay. The overlay exposes first the store on disk (needed for boot) and then looks for the files in the hut store using a read-only NFS mount. Example:

hut% nix eval nixpkgs#cowsay.outPath
"/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0"
hut% ls /nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0
ls: cannot access '/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0': No such file or directory
hut% ssh owl2 ls /nix/store | grep k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0
hut% nix build nixpkgs#cowsay
hut% ssh owl2 ls /nix/store | grep k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0
k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0

This allows users to run SLURM jobs in the compute nodes reading the dependencies directly from the hut store via NFS, avoiding the need to copy the clousure for each execution.

It is also posible to run nix develop directly from the compute nodes, but it is not recommended as it will replicate the same data in every compute node local disk.

Closes #23

This MR solves one of the remaining pieces to have a working cluster with SLURM and Nix. It mounts the nix store from hut into the compute nodes `owl1` and `owl2` by using an overlay. The overlay exposes first the store on disk (needed for boot) and then looks for the files in the hut store using a read-only NFS mount. Example: ``` hut% nix eval nixpkgs#cowsay.outPath "/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0" hut% ls /nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0 ls: cannot access '/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0': No such file or directory hut% ssh owl2 ls /nix/store | grep k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0 hut% nix build nixpkgs#cowsay hut% ssh owl2 ls /nix/store | grep k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0 k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0 ``` This allows users to run SLURM jobs in the compute nodes reading the dependencies directly from the hut store via NFS, avoiding the need to copy the clousure for each execution. It is also posible to run nix develop directly from the compute nodes, but it is not recommended as it will replicate the same data in every compute node local disk. Closes #23
rarias commented 2023-09-18 14:14:47 +02:00 (Migrated from pm.bsc.es)

assigned to @rarias

assigned to @rarias
rarias commented 2023-09-18 14:24:16 +02:00 (Migrated from pm.bsc.es)

changed the description

changed the description
rarias commented 2023-09-18 18:18:22 +02:00 (Migrated from pm.bsc.es)

added 4 commits

  • e6014511 - Wait for the NFS hut store to be mounted
  • 8c11c746 - Delay the mount until the network is ready
  • 1fc6891d - Remove nix-daemon.socket dependency of /nix/store
  • 77b41a90 - Patch nix instead of using an override unit

Compare with previous version

added 4 commits <ul><li>e6014511 - Wait for the NFS hut store to be mounted</li><li>8c11c746 - Delay the mount until the network is ready</li><li>1fc6891d - Remove nix-daemon.socket dependency of /nix/store</li><li>77b41a90 - Patch nix instead of using an override unit</li></ul> [Compare with previous version](/gitlab/rarias/jungle/-/merge_requests/22/diffs?diff_id=9388&start_sha=320c58ce483f23da692106d8c4eddf3ac8756912)
rarias commented 2023-09-18 19:34:05 +02:00 (Migrated from pm.bsc.es)

added 3 commits

  • e4cbcab8 - Use a systemd mount directly for the nix store
  • 3bb0b550 - Add a RequiredBy dependency for remote-fs.target
  • e065cde3 - Use NixOS attributes for the install section

Compare with previous version

added 3 commits <ul><li>e4cbcab8 - Use a systemd mount directly for the nix store</li><li>3bb0b550 - Add a RequiredBy dependency for remote-fs.target</li><li>e065cde3 - Use NixOS attributes for the install section</li></ul> [Compare with previous version](/gitlab/rarias/jungle/-/merge_requests/22/diffs?diff_id=9390&start_sha=77b41a90e242e4a3bb12ba357067c4b0221bc732)
rarias commented 2023-09-18 20:02:31 +02:00 (Migrated from pm.bsc.es)

marked this merge request as ready

marked this merge request as **ready**
rarias commented 2023-09-18 20:02:31 +02:00 (Migrated from pm.bsc.es)

changed the description

changed the description
rarias commented 2023-09-18 20:02:31 +02:00 (Migrated from pm.bsc.es)

requested review from @arocanon

requested review from @arocanon
rarias commented 2023-09-19 11:18:18 +02:00 (Migrated from pm.bsc.es)

marked this merge request as draft

marked this merge request as **draft**
rarias commented 2023-09-19 11:18:29 +02:00 (Migrated from pm.bsc.es)

Blocked until #41

Blocked until #41
rarias closed this pull request 2024-07-22 12:22:34 +02:00

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#66
No description provided.