Add Fox to SLURM #150

Closed
opened 2025-07-23 12:41:42 +02:00 by rarias · 4 comments
Owner

We may be able to make SLURM see fox by allowing incoming conections in Fox from Apex. However, SLURM needs a lot of ports open, which may not a very good idea.

We may be able to make SLURM see fox by allowing incoming conections in Fox from Apex. However, SLURM needs a lot of ports open, which may not a very good idea.
Author
Owner

We should move the slurm controller to apex first.

We should move the slurm controller to apex first.
rarias added a new dependency 2025-08-27 11:47:33 +02:00
Author
Owner

Fox cannot reach apex, but apex can reach fox. Unfortunately, SLURM needs to open conections in both directions (needlessly).

We could:

  • Move apex to the outside world, but that would cause network access changes and open the door to attacks. May be a net benefit in the long run as we could directly expose other services.
  • Encapsulate the traffic across a secure channel, maybe we can setup a wireguard link and harden the allowed traffic via iptables.

Having direct visibility with static IPs on both ends seems ok as we can whitelist boths ends only and avoid network noise.

Fox cannot reach apex, but apex can reach fox. Unfortunately, SLURM needs to open conections in both directions (needlessly). We could: - Move apex to the outside world, but that would cause network access changes and open the door to attacks. May be a net benefit in the long run as we could directly expose other services. - Encapsulate the traffic across a secure channel, maybe we can setup a wireguard link and harden the allowed traffic via iptables. Having direct visibility with static IPs on both ends seems ok as we can whitelist boths ends only and avoid network noise.
Author
Owner

Making a direct connection from Fox to Apex is not safe, as SLURM may not protect packets from manipulation so an attack in the middle may be able to inject commands on Fox. The only safe solution is to pass the traffic through a secure channel that provides authentication.

Making a direct connection from Fox to Apex is not safe, as SLURM may not protect packets from manipulation so an attack in the middle may be able to inject commands on Fox. The only safe solution is to pass the traffic through a secure channel that provides authentication.
Author
Owner

Configured SLURM to pass through the wireguard tunnel, it seems to be working fine:

apex% sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
owl*         up   infinite      2  idle~ owl[1-2]
fox          up   infinite      1   idle fox

apex% scontrol show node fox
NodeName=fox Arch=x86_64 CoresPerSocket=24
   CPUAlloc=0 CPUEfctv=192 CPUTot=192 CPULoad=0.06
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=fox NodeHostName=fox Version=24.11.5
   OS=Linux 6.15.6 #1-NixOS SMP PREEMPT_DYNAMIC Thu Jul 10 14:08:55 UTC 2025
   RealMemory=1 AllocMem=0 FreeMem=640380 Sockets=8 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=fox
   BootTime=2025-08-05T16:23:24 SlurmdStartTime=2025-08-29T14:58:59
   LastBusyTime=2025-08-29T15:02:33 ResumeAfterTime=None
   CfgTRES=cpu=192,mem=1M,billing=192
   AllocTRES=
   CurrentWatts=0 AveWatts=0
   
apex% srun -p fox hostname
fox
Configured SLURM to pass through the wireguard tunnel, it seems to be working fine: ``` apex% sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST owl* up infinite 2 idle~ owl[1-2] fox up infinite 1 idle fox apex% scontrol show node fox NodeName=fox Arch=x86_64 CoresPerSocket=24 CPUAlloc=0 CPUEfctv=192 CPUTot=192 CPULoad=0.06 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=fox NodeHostName=fox Version=24.11.5 OS=Linux 6.15.6 #1-NixOS SMP PREEMPT_DYNAMIC Thu Jul 10 14:08:55 UTC 2025 RealMemory=1 AllocMem=0 FreeMem=640380 Sockets=8 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=fox BootTime=2025-08-05T16:23:24 SlurmdStartTime=2025-08-29T14:58:59 LastBusyTime=2025-08-29T15:02:33 ResumeAfterTime=None CfgTRES=cpu=192,mem=1M,billing=192 AllocTRES= CurrentWatts=0 AveWatts=0 apex% srun -p fox hostname fox ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Depends on
Reference: rarias/jungle#150