299 Commits

Author SHA1 Message Date
7c901742e0 Allow traffic from docker to enter port 23080
Before:

  hut% sudo docker run -it --rm alpine /bin/ash -xc 'true | nc -w 3 -v 10.0.40.7 23080'
  + true
  + nc -w 3 -v 10.0.40.7 23080
  nc: 10.0.40.7 (10.0.40.7:23080): Operation timed out

After:

  hut% sudo docker run -it --rm alpine /bin/ash -xc 'true | nc -w 3 -v 10.0.40.7 23080'
  + true
  + nc -w 3 -v 10.0.40.7 23080
  10.0.40.7 (10.0.40.7:23080) open

Fixes: #94
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-15 12:17:00 +02:00
a492e06327 Add bscpm04.bsc.es SSH host and public key
Allows fetching repositories from hut and other machines in jungle
without the need to do any extra configuration.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-11 12:15:33 +02:00
4ed53d4384 Use hut nix cache in owl1, owl2 and raccoon
For owl1 and owl2 directly connect to hut via LAN with HTTP, but for
raccoon pass via the proxy using jungle.bsc.es with HTTPS. There is no
risk of tampering as packages are signed.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-02-26 16:03:26 +01:00
ad26c63fa2 Clean all iptables rules on stop
Prevents the "iptables: Chain already exists." error by making sure that
we don't leave any chain on start. The ideal solution is to use
iptables-restore instead, which will do the right job. But this needs to
be changed in NixOS entirely.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-11 10:23:26 +02:00
563dc575fd Make nginx listen on all interfaces
Needed for local hosts to contact the nix cache via HTTP directly.
We also allow the incoming traffic on port 80.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-11 10:03:05 +02:00
097c7bc31f Fix nginx /cache regex
`nix-serve` does not handle duplicates in the path:
```
hut$ curl http://127.0.0.1:5000/nix-cache-info
StoreDir: /nix/store
WantMassQuery: 1
Priority: 30
hut$ curl http://127.0.0.1:5000//nix-cache-info
File not found.
```

This meant that the cache was not accessible via:
`curl https://jungle.bsc.es/cache/nix-cache-info` but
`curl https://jungle.bsc.es/cachenix-cache-info` worked.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-02-26 15:31:05 +01:00
17e42b3872 Add new GitLab runner for gitlab.bsc.es
It uses docker based on alpine and the host nix store, so we can perform
builds but isolate them from the system.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-24 13:00:54 +01:00
db04825a11 Remove SLURM partition all
We no longer have homogeneous nodes so it doesn't make much sense to
allocate a mix of them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-07 16:17:32 +02:00
7f395ba2d9 Add varcila user to hut and fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-03-28 11:53:33 +01:00
5683fe5be1 Adjust fox slurm config after disabling SMT
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-03-28 11:04:19 +01:00
b44bdfb10f Add abonerib user to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-25 14:33:11 +01:00
b1adbed3de Don't move doc in web output
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-14 16:36:57 +01:00
8ff54219f6 Reject SSH connections without SLURM allocation
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-13 14:47:38 +01:00
580bfad9ec Add users to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 16:46:56 +01:00
afe7ae445b Add dalvare1 user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 16:39:51 +01:00
9dea4e2379 Mount NVME disks in /nvme{0,1}
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 15:49:55 +01:00
b046baee48 Exclude fox from being suspended by slurm
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 15:02:18 +01:00
8766fd8439 Use IPMI host names instead of IP addresses
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 12:14:40 +01:00
b70d99f479 Add fox IPMI monitoring
Use agenix to store the credentials safely.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-12 11:36:53 +01:00
a0eae1feea Add new fox machine
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-02-11 12:56:30 +01:00
e9740c471d Update PM GitLab tokens to new URL
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-15 14:38:57 +01:00
9b183c4202 Fix MPICH build by fetching upstream patches too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-15 13:16:10 +01:00
90036b8ea2 flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6' (2024-07-09)
  → 'github:ryantm/agenix/f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41' (2024-08-10)
• Updated input 'bscpkgs':
    'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=de89197a4a7b162db7df9d41c9d07759d87c5709' (2024-04-24)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f' (2024-11-29)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/693bc46d169f5af9c992095736e82c3488bf7dbb' (2024-07-14)
  → 'github:NixOS/nixpkgs/9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc' (2025-01-14)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-15 12:44:51 +01:00
bb4e42e149 Set nixpkgs to track nixos-24.11
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-15 12:43:45 +01:00
23aa682816 Add script to monitor GPFS
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-14 12:01:00 +01:00
3e26c69f69 Add BSC machines to ssh config
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-14 15:51:34 +01:00
aa977ee62a Collect statistics from logged users
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-11-14 12:21:13 +01:00
7b9d805d12 Add custom GPFS exporter for MN5
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-11-12 16:30:24 +01:00
4aa011ff85 Remove exception to fetch task endpoint
It causes the request to go to the website rather than the Gitea
service.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-10-22 16:13:01 +02:00
4b41b67d25 Use SSD for boot, then switch to NVME
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-10-21 14:28:17 +02:00
e3f6e67348 Use NVME as root
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-10-17 14:39:31 +02:00
129fa52e9b Keep host header for Grafana requests
This was breaking requests due to CSRF check.

See: https://github.com/grafana/grafana/issues/45117#issuecomment-1033842787
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-10-17 13:35:45 +02:00
0e1ea5d504 Ignore logging requests from the gitea runner
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-20 15:44:22 +02:00
95eef3b0c5 Log the client IP not the proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-20 15:24:38 +02:00
7d25055f98 Ignore misc directory
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-20 15:25:06 +02:00
b978f12d19 Create paste directories in /ceph/p
Ensure that all hut users have a paste directory in /ceph/p owned by
themselves. We need to wait for the ceph mount point to create them, so
we use a systemd service that waits for the remote-fs.target.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-20 11:19:30 +02:00
c1617266b6 Add p command to paste files
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-16 16:33:42 +02:00
83830dbfed Use nginx to serve website and other services
Instead of using multiple tunels to forward all our services to the VM
that serves jungle.bsc.es, just use nginx to redirect the traffic from
hut. This allows adding custom rules for paths that are not posible
otherwise.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-16 16:33:34 +02:00
0bcac3bca4 Mount the NVME disk in /nvme
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-23 16:15:26 +02:00
f41771d55f Delay nix-gc until /home is mounted
Prevents starting the garbage collector before the remote FS are
mounted, in particular /home. Otherwise, all the gcroots which have
symlinks in /home will be considered stale and they will be removed.

See: #79
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-09-18 11:04:44 +02:00
1e90c038a1 Add dbautist user with access to hut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-09-18 15:21:01 +02:00
439f40240f Set the serial console to ttyS1 in raccoon
Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-22 13:34:19 +02:00
e5feebbd8f Remove setLdLibraryPath and driSupport options
They have been removed from NixOS. The "hardware.opengl" group is now
renamed to "hardware.graphics".

See: 98cef4c273
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-22 12:36:20 +02:00
38f0fb7f78 Add documentation section about GRUB chain loading
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-06-07 10:40:37 +02:00
bb566b7eeb Add 10 min shutdown jitter to avoid spikes
The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-22 11:20:02 +02:00
f7d60c4bbe Don't mount the nix store in owl nodes
Initially we planned to run jobs in those nodes by sharing the same nix
store from hut. However, these nodes are now used to build packages
which are not available in hut. Users also ssh to the nodes, which
doesn't mount the hut store, so it doesn't make much sense to keep
mounting it.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-22 11:02:32 +02:00
3c1be2d4b4 Emulate other architectures in owl nodes too
Allows cross-compilation of packages for RISC-V that are known to try to
run RISC-V programs in the host.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-19 17:53:10 +02:00
b04a064583 Program shutdown for August 2nd for all machines
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-18 18:01:45 +02:00
e78021c319 Enable debuginfod daemon in owl nodes
WARNING: This will introduce noise, as the daemon wakes up from time to
time to check for new packages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-18 16:12:16 +02:00
2cba78cee1 Set gitea and grafana log level to warn
Prevents filling the journal logs with information messages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-07-18 13:39:16 +02:00