Use hut substituter in all nodes #175

Manually merged
abonerib merged 2 commits from abonerib/jungle:hut-substituter into master 2025-09-29 18:48:06 +02:00
Collaborator

All nodes should be able to reach hut now so we can run the cache without going through nginx.

This makes hut the substituter for all nodes, but we still have tent as the public facing cache.

I don't know if we want bay and eudy to use the cache.

All nodes should be able to reach hut now so we can run the cache without going through nginx. This makes hut the substituter for all nodes, but we still have tent as the public facing cache. I don't know if we want `bay` and `eudy` to use the cache.
abonerib added 1 commit 2025-09-26 12:42:10 +02:00
Owner

I don't know if we want bay and eudy to use the cache.

Yes, I think it would be good for all machines to try fetching packages locally as the link is faster than going to cache.nixos.org. Notice lake2 is missing.

What happens if the cache is not reachable (hut is down)? I remember there was some painful issue with the timeouts. I'd imagine a way of testing is to specify a non-existing node, you can try with 10.0.40.111 instead of hut.

> I don't know if we want `bay` and `eudy` to use the cache. Yes, I think it would be good for all machines to try fetching packages locally as the link is faster than going to cache.nixos.org. Notice lake2 is missing. What happens if the cache is not reachable (hut is down)? I remember there was some painful issue with the timeouts. I'd imagine a way of testing is to specify a non-existing node, you can try with `10.0.40.111` instead of hut.
rarias requested review from rarias 2025-09-26 13:16:42 +02:00
Author
Collaborator

I don't know if we want bay and eudy to use the cache.

Yes, I think it would be good for all machines to try fetching packages locally as the link is faster than going to cache.nixos.org. Notice lake2 is missing.

Added lake2

What happens if the cache is not reachable (hut is down)? I remember there was some painful issue with the timeouts. I'd imagine a way of testing is to specify a non-existing node, you can try with 10.0.40.111 instead of hut.

Just tried in weasel. It gave some timeout warnings and moved on after 5 attempts (took ~30s in total, including evaluation)

warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3068 ms: Could not connect to server; retrying in 287 ms
warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3104 ms: Could not connect to server; retrying in 643 ms
warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3068 ms: Could not connect to server; retrying in 1405 ms
warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3074 ms: Could not connect to server; retrying in 2224 ms
warning: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3086 ms: Could not connect to server

Worryingly, in my laptop each timeout is 135 seconds instead, which would take more than 10 minutes...

$ nix build --substituters http://10.0.40.111/cache .#bsc-ci.all -L
warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Timeout was reached (28) Failed to connect to 10.0.40.111 port 80 after 134400 ms: Could not connect to server; retrying in 272 ms
error: interrupted by the user

We can try the other nodes with nix build --substituters https://10.0.40.111/cache (from a trusted user) and see if it's just some weird config on my side.

> > I don't know if we want `bay` and `eudy` to use the cache. > > Yes, I think it would be good for all machines to try fetching packages locally as the link is faster than going to cache.nixos.org. Notice lake2 is missing. Added lake2 > What happens if the cache is not reachable (hut is down)? I remember there was some painful issue with the timeouts. I'd imagine a way of testing is to specify a non-existing node, you can try with `10.0.40.111` instead of hut. Just tried in weasel. It gave some timeout warnings and moved on after 5 attempts (took `~30s` in total, including evaluation) ``` warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3068 ms: Could not connect to server; retrying in 287 ms warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3104 ms: Could not connect to server; retrying in 643 ms warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3068 ms: Could not connect to server; retrying in 1405 ms warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3074 ms: Could not connect to server; retrying in 2224 ms warning: unable to download 'http://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 80 after 3086 ms: Could not connect to server ``` Worryingly, in my laptop each timeout is 135 seconds instead, which would take more than 10 minutes... ``` $ nix build --substituters http://10.0.40.111/cache .#bsc-ci.all -L warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Timeout was reached (28) Failed to connect to 10.0.40.111 port 80 after 134400 ms: Could not connect to server; retrying in 272 ms error: interrupted by the user ``` We can try the other nodes with `nix build --substituters https://10.0.40.111/cache` (from a trusted user) and see if it's just some weird config on my side.
abonerib force-pushed hut-substituter from c993962708 to 1ad98a3b49 2025-09-26 15:00:38 +02:00 Compare
Owner

It seems there is a system-wide timeout of 3 seconds, then it retries 5 times:

apex% time nc -v 10.0.40.111 80
nc: connect to 10.0.40.111 port 80 (tcp) failed: No route to host
nc -v 10.0.40.111 80  0,00s user 0,00s system 0% cpu 3,057 total

apex% nix build -v --substituters https://10.0.40.111/cache .#bench6
warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po
rt 443 after 3103 ms: Could not connect to server; retrying in 307 ms
warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po
rt 443 after 3084 ms: Could not connect to server; retrying in 529 ms
warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po
rt 443 after 3054 ms: Could not connect to server; retrying in 1315 ms
warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po
rt 443 after 3100 ms: Could not connect to server; retrying in 2565 ms
warning: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 443
after 3065 ms: Could not connect to server
these 602 derivations will be built:
...

Also, I assume this is onces per build command, not for each package, otherwise it would be pretty bad.

It seems there is a system-wide timeout of 3 seconds, then it retries 5 times: ``` apex% time nc -v 10.0.40.111 80 nc: connect to 10.0.40.111 port 80 (tcp) failed: No route to host nc -v 10.0.40.111 80 0,00s user 0,00s system 0% cpu 3,057 total apex% nix build -v --substituters https://10.0.40.111/cache .#bench6 warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po rt 443 after 3103 ms: Could not connect to server; retrying in 307 ms warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po rt 443 after 3084 ms: Could not connect to server; retrying in 529 ms warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po rt 443 after 3054 ms: Could not connect to server; retrying in 1315 ms warning: error: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 po rt 443 after 3100 ms: Could not connect to server; retrying in 2565 ms warning: unable to download 'https://10.0.40.111/cache/nix-cache-info': Could not connect to server (7) Failed to connect to 10.0.40.111 port 443 after 3065 ms: Could not connect to server these 602 derivations will be built: ... ``` Also, I assume this is onces per build command, not for each package, otherwise it would be pretty bad.
Owner
Maybe lowering this to 1s? https://nix.dev/manual/nix/2.24/command-ref/conf-file.html#conf-connect-timeout
abonerib added 1 commit 2025-09-29 09:45:54 +02:00
Author
Collaborator

Maybe lowering this to 1s?

https://nix.dev/manual/nix/2.24/command-ref/conf-file.html#conf-connect-timeout

That seems to fix my laptop timeout.

What's interesting is that the flag does not work, I have to set it in the system nix.conf:

$ nix build --connect-timeout 1 --substituters http://10.0.40.111/cache .#bsc-ci.all -L
warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Timeout was reached (28) Connection timed out after 5000 milliseconds; retrying in 281 ms

🤷

> Maybe lowering this to 1s? > > https://nix.dev/manual/nix/2.24/command-ref/conf-file.html#conf-connect-timeout That seems to fix my laptop timeout. What's interesting is that the flag does not work, I have to set it in the system `nix.conf`: ``` $ nix build --connect-timeout 1 --substituters http://10.0.40.111/cache .#bsc-ci.all -L warning: error: unable to download 'http://10.0.40.111/cache/nix-cache-info': Timeout was reached (28) Connection timed out after 5000 milliseconds; retrying in 281 ms ``` 🤷
Author
Collaborator

It seems there is a system-wide timeout of 3 seconds

Yes. I have now set it explicitly, since I am not sure why on my laptop the timeout is much higher.

Also, I assume this is onces per build command, not for each package, otherwise it would be pretty bad.

Yes, all my attempts to build multiple packages do the initial cache timeout and then just build locally.

> It seems there is a system-wide timeout of 3 seconds Yes. I have now set it explicitly, since I am not sure why on my laptop the timeout is much higher. > Also, I assume this is onces per build command, not for each package, otherwise it would be pretty bad. Yes, all my attempts to build multiple packages do the initial cache timeout and then just build locally.
rarias approved these changes 2025-09-29 18:43:20 +02:00
rarias force-pushed hut-substituter from 4e3abb19a9 to 163d19bd05 2025-09-29 18:45:34 +02:00 Compare
abonerib manually merged commit 163d19bd05 into master 2025-09-29 18:48:06 +02:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: rarias/jungle#175