21 Commits

Author SHA1 Message Date
3590879c17 Add documentation section with commands to query the cache 2025-04-08 17:48:33 +02:00
31d03ee3ac Add instructions on how to use the cache in the web 2025-04-08 17:48:33 +02:00
4e92b14384 Use hut-substituter module in owl{1,2} and raccoon 2025-04-08 17:48:33 +02:00
b90209b4bf Add module to use hut as a binary substituter 2025-04-08 17:48:33 +02:00
785f7cfee8 Fix nginx /cache regex
`nix-serve` does not handle duplicates in the path:
```
hut$ curl http://127.0.0.1:5000/nix-cache-info
StoreDir: /nix/store
WantMassQuery: 1
Priority: 30
hut$ curl http://127.0.0.1:5000//nix-cache-info
File not found.
```

This meant that the cache was not accessible via:
`curl https://jungle.bsc.es/cache/nix-cache-info` but
`curl https://jungle.bsc.es/cachenix-cache-info` worked.
2025-04-08 17:48:33 +02:00
edf744db8d Add new GitLab runner for gitlab.bsc.es
It uses docker based on alpine and the host nix store, so we can perform
builds but isolate them from the system.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:41:18 +02:00
b82894eaec Remove SLURM partition all
We no longer have homogeneous nodes so it doesn't make much sense to
allocate a mix of them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:27 +02:00
1c47199891 Add varcila user to hut and fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:25 +02:00
8738bd4eeb Adjust fox slurm config after disabling SMT
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:23 +02:00
7699783aac Add abonerib user to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:21 +02:00
fee1d4da7e Don't move doc in web output
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:19 +02:00
b77ce7fb56 Add quickstart guide
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:17 +02:00
b4a12625c5 Reject SSH connections without SLURM allocation
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:15 +02:00
302106ea9a Add users to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:13 +02:00
96877de8d9 Add dalvare1 user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:11 +02:00
8878985be6 Add fox page in jungle website
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:08 +02:00
737578db34 Mount NVME disks in /nvme{0,1}
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:06 +02:00
88555e3f8c Exclude fox from being suspended by slurm
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:04 +02:00
feb2060be7 Use IPMI host names instead of IP addresses
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:01 +02:00
00999434c2 Add fox IPMI monitoring
Use agenix to store the credentials safely.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:14:59 +02:00
29d58cc62d Add new fox machine
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:14:42 +02:00
11 changed files with 177 additions and 44 deletions

View File

@@ -81,7 +81,7 @@
home = "/home/Computational/abonerib";
description = "Aleix Boné";
group = "Computational";
hosts = [ "owl1" "owl2" "hut" "raccoon" ];
hosts = [ "owl1" "owl2" "hut" "raccoon" "fox" ];
hashedPassword = "$6$V1EQWJr474whv7XJ$OfJ0wueM2l.dgiJiiah0Tip9ITcJ7S7qDvtSycsiQ43QBFyP4lU0e0HaXWps85nqB4TypttYR4hNLoz3bz662/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIIFiqXqt88VuUfyANkZyLJNiuroIITaGlOOTMhVDKjf abonerib@bsc"
@@ -126,6 +126,19 @@
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEfy6F4rF80r4Cpo2H5xaWqhuUZzUsVsILSKGJzt5jF dalvare1@ssfhead"
];
};
varcila = {
uid = 5650;
isNormalUser = true;
home = "/home/Computational/varcila";
description = "Vincent Arcila";
group = "Computational";
hosts = [ "hut" "fox" ];
hashedPassword = "$6$oB0Tcn99DcM4Ch$Vn1A0ulLTn/8B2oFPi9wWl/NOsJzaFAWjqekwcuC9sMC7cgxEVb.Nk5XSzQ2xzYcNe5MLtmzkVYnRS1CqP39Y0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKGt0ESYxekBiHJQowmKpfdouw0hVm3N7tUMtAaeLejK vincent@varch"
];
};
};
groups = {

View File

@@ -22,8 +22,8 @@
"--docker-network-mode host"
];
environmentVariables = {
https_proxy = "http://localhost:23080";
http_proxy = "http://localhost:23080";
https_proxy = "http://hut:23080";
http_proxy = "http://hut:23080";
};
};
in {
@@ -38,14 +38,13 @@
gitlab-bsc-docker = {
# gitlab.bsc.es still uses the old token mechanism
registrationConfigFile = config.age.secrets.gitlab-bsc-docker.path;
tagList = [ "docker" "hut" ];
environmentVariables = {
https_proxy = "http://localhost:23080";
http_proxy = "http://localhost:23080";
# We cannot access the hut local interface from docker, so we connect
# to hut directly via the ethernet one.
https_proxy = "http://hut:23080";
http_proxy = "http://hut:23080";
};
# FIXME
registrationFlags = [
"--docker-network-mode host"
];
executor = "docker";
dockerImage = "alpine";
dockerVolumes = [
@@ -53,7 +52,15 @@
"/nix/var/nix/db:/nix/var/nix/db:ro"
"/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro"
];
dockerExtraHosts = [
# Required to pass the proxy via hut
"hut:10.0.40.7"
];
dockerDisableCache = true;
registrationFlags = [
# Increase build log length to 64 MiB
"--output-limit 65536"
];
preBuildScript = pkgs.writeScript "setup-container" ''
mkdir -p -m 0755 /nix/var/log/nix/drvs
mkdir -p -m 0755 /nix/var/nix/gcroots
@@ -66,32 +73,38 @@
mkdir -p -m 0700 "$HOME/.nix-defexpr"
mkdir -p -m 0700 "$HOME/.ssh"
cat > "$HOME/.ssh/config" << EOF
Host bscpm03.bsc.es gitlab-internal.bsc.es
Host bscpm04.bsc.es gitlab-internal.bsc.es
User git
ProxyCommand nc -X connect -x hut:23080 %h %p
Host amdlogin1.bsc.es armlogin1.bsc.es hualogin1.bsc.es glogin1.bsc.es glogin2.bsc.es fpgalogin1.bsc.es
ProxyCommand nc -X connect -x hut:23080 %h %p
EOF
cat >> "$HOME/.ssh/known_hosts" << EOF
bscpm03.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIM2NuSUPsEhqz1j5b4Gqd+MWFnRqyqY57+xMvBUqHYUS
bscpm04.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPx4mC0etyyjYUT2Ztc/bs4ZXSbVMrogs1ZTP924PDgT
gitlab-internal.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3
EOF
. ${pkgs.nix}/etc/profile.d/nix-daemon.sh
${pkgs.nix}/bin/nix-channel --add https://nixos.org/channels/nixos-24.11 nixpkgs
${pkgs.nix}/bin/nix-channel --update nixpkgs
${pkgs.nix}/bin/nix-env -i ${lib.concatStringsSep " " (with pkgs; [ nix cacert git openssh netcat curl ])}
# Required to load SSL certificate paths
. ${pkgs.cacert}/nix-support/setup-hook
'';
environmentVariables = {
ENV = "/etc/profile";
USER = "root";
NIX_REMOTE = "daemon";
PATH = "/nix/var/nix/profiles/default/bin:/nix/var/nix/profiles/default/sbin:/bin:/sbin:/usr/bin:/usr/sbin";
NIX_SSL_CERT_FILE = "/nix/var/nix/profiles/default/etc/ssl/certs/ca-bundle.crt";
PATH = "${config.system.path}/bin:/bin:/sbin:/usr/bin:/usr/sbin";
};
};
};
};
# DOCKER* chains are useless, override at FORWARD
networking.firewall.extraCommands = ''
# Allow docker to use our proxy
iptables -I FORWARD 1 -p tcp -i docker0 -d hut --dport 23080 -j nixos-fw-accept
# Block anything else coming from docker
iptables -I FORWARD 2 -p all -i docker0 -j nixos-fw-log-refuse
'';
#systemd.services.gitlab-runner.serviceConfig.Shell = "${pkgs.bash}/bin/bash";
systemd.services.gitlab-runner.serviceConfig.DynamicUser = lib.mkForce false;
systemd.services.gitlab-runner.serviceConfig.User = "gitlab-runner";

View File

@@ -76,7 +76,7 @@
group = "root";
user = "root";
configFile = config.age.secrets.ipmiYml.path;
extraFlags = [ "--log.level=debug" ];
# extraFlags = [ "--log.level=debug" ];
listenAddress = "127.0.0.1";
};
node = {

View File

@@ -12,6 +12,8 @@ let
installPhase = ''
cp -r public $out
'';
# Don't mess doc/
dontFixup = true;
};
in
{
@@ -38,7 +40,7 @@ in
proxy_redirect http:// $scheme://;
}
location /cache {
rewrite ^/cache(.*) /$1 break;
rewrite ^/cache/(.*) /$1 break;
proxy_pass http://127.0.0.1:5000;
proxy_redirect http:// $scheme://;
}

View File

@@ -0,0 +1,10 @@
{ config, ... }:
{
nix.settings =
# Don't add hut as a cache to itself
assert config.networking.hostName != "hut";
{
substituters = [ "https://jungle.bsc.es/cache" ];
trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
}

View File

@@ -27,22 +27,6 @@ let
done
'';
prolog = pkgs.writeScript "prolog.sh" ''
#!/usr/bin/env bash
echo "hello from the prolog"
exit 0
'';
epilog = pkgs.writeScript "epilog.sh" ''
#!/usr/bin/env bash
echo "hello from the epilog"
exit 0
'';
in {
systemd.services.slurmd.serviceConfig = {
# Kill all processes in the control group on stop/restart. This will kill
@@ -59,14 +43,13 @@ in {
clusterName = "jungle";
nodeName = [
"owl[1,2] Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 Feature=owl"
"fox Sockets=2 CoresPerSocket=96 ThreadsPerCore=2 Feature=fox"
"fox Sockets=2 CoresPerSocket=96 ThreadsPerCore=1 Feature=fox"
"hut Sockets=2 CoresPerSocket=14 ThreadsPerCore=2"
];
partitionName = [
"owl Nodes=owl[1-2] Default=YES DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
"fox Nodes=fox Default=NO DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
"all Nodes=owl[1-2],hut Default=NO DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
];
# See slurm.conf(5) for more details about these options.

View File

@@ -8,6 +8,7 @@
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/debuginfod.nix
../module/hut-substituter.nix
];
# Select the this using the ID to avoid mismatches

View File

@@ -8,6 +8,7 @@
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/debuginfod.nix
../module/hut-substituter.nix
];
# Select the this using the ID to avoid mismatches

View File

@@ -3,6 +3,7 @@
{
imports = [
../common/base.nix
../module/hut-substituter.nix
];
# Don't install Grub on the disk yet

View File

@@ -1,9 +1,11 @@
age-encryption.org/v1
-> ssh-ed25519 HY2yRg 4Xns3jybBuv8flzd+h3DArVBa/AlKjt1J9jAyJsasCE
uyVjJxh5i8aGgAgCpPl6zTYeIkf9mIwURof51IKWvwE
-> ssh-ed25519 CAWG4Q T2r6r1tyNgq1XlYXVtLJFfOfUnm6pSVlPwUqC1pkyRo
9yDoKU0EC34QMUXYnsJvhPCLm6oD9w7NlTi2sheoBqQ
-> ssh-ed25519 MSF3dg Bh9DekFTq+QMUEAonwcaIAJX4Js1O7cHjDniCD0gtm8
t/Ro0URLeDUWcvb7rlkG2s03PZ+9Rr3N4TIX03tXpVc
--- E5+/D4aK2ihKRR4YC5XOTmUbKgOqBR0Nk0gYvFOzXOI
<EFBFBD><EFBFBD><EFBFBD>yKF~dj<64><6A>r%<25><>'<27><><EFBFBD>P<EFBFBD>&_-l<><6C><EFBFBD>&<26>o<EFBFBD>_<EFBFBD>r<><72>r<EFBFBD><72>߁<EFBFBD>0<18>,<2C>U7<55>nC<6E>Te<54><18>[f<>97<39><37><EFBFBD><EFBFBD><EFBFBD><EFBFBD><10><><EFBFBD>C!D<>E<EFBFBD>W<EFBFBD>*<2A>LA<4C>x6<78>#<EFBFBD><EFBFBD>
-> ssh-ed25519 HY2yRg WSdjyQPzBJ4JbzQpGeq1AAYpWKoXmLI1ZtmNmM5QOzs
qGDlDT31DQF1DdHen0+5+52DdsQlabJdA2pOB5O1I6g
-> ssh-ed25519 CAWG4Q wioWMDxQjN+d4JdIbCwZg0DLQu1OH2mV6gukRprjuAs
670fE61hidOEh20hHiQAhP0+CjDF0WMBNzgwkGT8Yqg
-> ssh-ed25519 MSF3dg DN19uvAEtqq4708P6HpuX9i/o/qAvHX6dj69dCF2H1o
4Lu9GnjiFLMeXJ2C7aVPJsCHCQVlhylNWJi896Av92s
--- 7cKBwOYNOUZ2h3/kAY09aSMASZSxX7hZIT4kvlIiT6w
<EFBFBD>6<EFBFBD><02><><EFBFBD><06>fQF5=<3D>bX+<2B>v e`<60>7/<2F><05>A~P<><50>Ѧ7<EFBFBD><EFBFBD>
<EFBFBD><EFBFBD>A<EFBFBD>)<29>h<05><>=oZ<6F>$<24> ^<5E>V0<56><>r
k<EFBFBD>u<EFBFBD>bĶ:R<><52>>^g<><67><06>ik_*% <0B>a7<61>KG<4B><47><EFBFBD><EFBFBD><EFBFBD><EFBFBD>&<26>PI<50><49>n

View File

@@ -13,6 +13,113 @@ which is available at `hut` or `xeon07`. It runs the following services:
- Grafana: to plot the data in the web browser.
- Slurmctld: to manage the SLURM nodes.
- Gitlab runner: to run CI jobs from Gitlab.
- Nix binary cache: to serve cached nix builds
This node is prone to interruptions from all the services it runs, so it is not
a good candidate for low noise executions.
# Binary cache
We provide a binary cache in `hut`, with the aim of avoiding unnecessary
recompilation of packages.
The cache should contain common packages from bsckpkgs, but we don't provide
any guarantee that of what will be available in the cache, or for how long.
We recommend following the latest version of the `jungle` flake to avoid cache
misses.
## Usage
### From NixOS
In NixOS, we can add the cache through the `nix.settings` option, which will
enable it for all builds in the system.
```nix
{ ... }: {
nix.settings = {
substituters = [ "https://jungle.bsc.es" ];
trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
}
```
### Interactively
The cache can also be specified in a per-command basis through the flags
`--substituters` and `--trusted-public-keys`:
```sh
nix build --substituters "https://jungle.bsc.es/cache" --trusted-public-keys "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" <...>
```
### Nix configuration file (non-nixos)
If using nix outside of NixOS, you'll have to update `nix.conf`
```bash
echo "substituters = https://jungle.bsc.es" >> /etc/nix/nix.conf
echo "trusted-public-keys = jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" >> /etc/nix/nix.conf
```
### Hint in flakes
By adding the configuration below to a `flake.nix`, when someone uses the flake,
`nix` will interactively ask to trust and use the provided binary cache:
```nix
{
nixConfig = {
extra-substituters = [
"https://jungle.bsc.es/cache"
];
extra-trusted-public-keys = [
"jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0="
];
};
outputs = { ... }: {
...
};
}
```
### Querying the cache
Check if the cache is available:
```sh
$ curl https://jungle.bsc.es/cache/nix-cache-info
StoreDir: /nix/store
WantMassQuery: 1
Priority: 30
```
Prevent nix from building locally:
```bash
nix build --max-jobs 0 <...>
```
Check if a package is in cache:
```bash
# Do a raw eval on the <package>.outPath (this should not build the package)
$ nix eval --raw jungle#openmp.outPath
/nix/store/dwnn4dgm1m4184l4xbi0qfrprji9wjmi-openmp-2024.11
# Take the hash (everything from / to - in the basename) and curl <hash>.narinfo
# if it exists in the cache, it will return HTTP 200 and some information
# if not, it will return 404
$ curl https://jungle.bsc.es/cache/dwnn4dgm1m4184l4xbi0qfrprji9wjmi.narinfo
StorePath: /nix/store/dwnn4dgm1m4184l4xbi0qfrprji9wjmi-openmp-2024.11
URL: nar/dwnn4dgm1m4184l4xbi0qfrprji9wjmi-17imkdfqzmnb013d14dx234bx17bnvws8baf3ii1xra5qi2y1wiz.nar
Compression: none
NarHash: sha256:17imkdfqzmnb013d14dx234bx17bnvws8baf3ii1xra5qi2y1wiz
NarSize: 1519328
References: 4gk773fqcsv4fh2rfkhs9bgfih86fdq8-gcc-13.3.0-lib nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36
Deriver: vcn0x8hikc4mvxdkvrdxp61bwa5r7lr6-openmp-2024.11.drv
Sig: jungle.bsc.es:GDTOUEs1jl91wpLbb+gcKsAZjpKdARO9j5IQqb3micBeqzX2M/NDtKvgCS1YyiudOUdcjwa3j+hyzV2njokcCA==
# In oneline:
$ curl "https://jungle.bsc.es/cache/$(nix eval --raw jungle#<package>.outPath | cut -d '/' -f4 | cut -d '-' -f1).narinfo"
```
#### References
- https://nix.dev/guides/recipes/add-binary-cache.html
- https://nixos.wiki/wiki/Binary_Cache