jungle-backup/_index.md at d3b355f6511288e70f22d73cb88473a1a8e65716

Rodrigo Arias Mallo d3b355f651 Add /nfs/home to fox documentation

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>

2025-09-03 15:34:05 +02:00

3.1 KiB

Raw Blame History

title, description, date

title	description	date
Fox	AMD Genoa 9684X with 2 NVIDIA RTX4000 GPUs	2025-02-12

Picture by Joanne Redwood, CC0.

The fox machine is a big GPU server that is configured to run heavy workloads. It has two fast AMD CPUs with large cache and 2 reasonable NVIDIA GPUs. Here are the detailed specifications:

2x AMD GENOA X 9684X DP/UP 96C/192T 2.55G 1,150M 400W SP5 3D V-cach
24x 32GB DDR5-4800 ECC RDIMM (total 768 GiB of RAM)
1x 2.5" SSD SATA3 MICRON 5400 MAX 480GB
2x 2.5" KIOXIA CM7-R 1.92TB NVMe GEN5 PCIe 5x4
2x NVIDIA RTX4000 ADA Gen 20GB GDDR6 PCIe 4.0

Access

To access the machine, request a SLURM session from apex using the fox partition. If you need the machine for performance measurements, use an exclusive reservation:

apex% salloc -p fox --exclusive

Otherwise, specify the CPUs that you need so other users can also use the node at the same time:

apex% salloc -p fox -c 8

Then use srun to execute an interactive shell:

apex% srun --pty $SHELL
fox%

Make sure you get all CPUs you expect:

fox% grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:	0-191

Follow these steps if you don't have access to apex or fox.

CUDA

To use CUDA, you can use the following flake.nix placed in a new directory to load all the required dependencies:

{
  inputs.jungle.url = "jungle";

  outputs = { jungle, ... }: {
    devShell.x86_64-linux = let
      pkgs = jungle.nixosConfigurations.fox.pkgs;
    in pkgs.mkShell {
      name = "cuda-env-shell";
      buildInputs = with pkgs; [
        git gitRepo gnupg autoconf curl
        procps gnumake util-linux m4 gperf unzip

        # Cuda packages (more at https://search.nixos.org/packages)
        cudatoolkit linuxPackages.nvidia_x11
        cudaPackages.cuda_cudart.static
        cudaPackages.libcusparse

        libGLU libGL
        xorg.libXi xorg.libXmu freeglut
        xorg.libXext xorg.libX11 xorg.libXv xorg.libXrandr zlib
        ncurses5 stdenv.cc binutils
      ];
      shellHook = ''
        export CUDA_PATH=${pkgs.cudatoolkit}
        export LD_LIBRARY_PATH=/var/run/opengl-driver/lib
        export SMS=50
      '';
    };
  };
}

Then just run nix develop from the same directory:

% mkdir cuda
% cd cuda
% vim flake.nix
[...]
% nix develop
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

Filesystems

The machine has several file systems available.

/nfs/home: The /home from apex via NFS, which is also shared with other xeon machines. It has about 2 ms of latency, so not suitable for quick random access.
/nvme{0,1}/$USER: The two local NVME disks, very fast and large capacity.
/tmp: tmpfs, fast but not backed by a disk. Will be erased on reboot.

3.1 KiB Raw Blame History

Access

CUDA

Filesystems

3.1 KiB

Raw Blame History