Rodrigo Arias Mallo 8835dbd764 Add AMD uProf section to fox documentation
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:22 +02:00

3.5 KiB

title, description, date
title description date
Fox AMD Genoa 9684X with 2 NVIDIA RTX4000 GPUs 2025-02-12

Fox

Picture by Joanne Redwood, CC0.

The fox machine is a big GPU server that is configured to run heavy workloads. It has two fast AMD CPUs with large cache and 2 reasonable NVIDIA GPUs. Here are the detailed specifications:

  • 2x AMD GENOA X 9684X DP/UP 96C/192T 2.55G 1,150M 400W SP5 3D V-cach
  • 24x 32GB DDR5-4800 ECC RDIMM (total 768 GiB of RAM)
  • 1x 2.5" SSD SATA3 MICRON 5400 MAX 480GB
  • 2x 2.5" KIOXIA CM7-R 1.92TB NVMe GEN5 PCIe 5x4
  • 2x NVIDIA RTX4000 ADA Gen 20GB GDDR6 PCIe 4.0

Access

To access the machine, request a SLURM session from apex using the fox partition. If you need the machine for performance measurements, use an exclusive reservation:

apex% salloc -p fox --exclusive

Otherwise, specify the CPUs that you need so other users can also use the node at the same time:

apex% salloc -p fox -c 8

Then use srun to execute an interactive shell:

apex% srun --pty $SHELL
fox%

Make sure you get all CPUs you expect:

fox% grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:	0-191

Follow these steps if you don't have access to apex or fox.

CUDA

To use CUDA, you can use the following flake.nix placed in a new directory to load all the required dependencies:

{
  inputs.jungle.url = "jungle";

  outputs = { jungle, ... }: {
    devShell.x86_64-linux = let
      pkgs = jungle.nixosConfigurations.fox.pkgs;
    in pkgs.mkShell {
      name = "cuda-env-shell";
      buildInputs = with pkgs; [
        git gitRepo gnupg autoconf curl
        procps gnumake util-linux m4 gperf unzip

        # Cuda packages (more at https://search.nixos.org/packages)
        cudatoolkit linuxPackages.nvidia_x11
        cudaPackages.cuda_cudart.static
        cudaPackages.libcusparse

        libGLU libGL
        xorg.libXi xorg.libXmu freeglut
        xorg.libXext xorg.libX11 xorg.libXv xorg.libXrandr zlib
        ncurses5 stdenv.cc binutils
      ];
      shellHook = ''
        export CUDA_PATH=${pkgs.cudatoolkit}
        export LD_LIBRARY_PATH=/var/run/opengl-driver/lib
        export SMS=50
      '';
    };
  };
}

Then just run nix develop from the same directory:

% mkdir cuda
% cd cuda
% vim flake.nix
[...]
% nix develop
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

AMD uProf

The AMD uProf performance analysis tool-suite is installed and ready to use.

See the AMD uProf user guide (PDF backup for v5.1) for more details on how to use the tools. To use the GUI make sure that you connect to fox using X11 forwarding.

Filesystems

The machine has several file systems available.

  • /nfs/home: The /home from apex via NFS, which is also shared with other xeon machines. It has about 2 ms of latency, so not suitable for quick random access.
  • /nvme{0,1}/$USER: The two local NVME disks, very fast and large capacity.
  • /tmp: tmpfs, fast but not backed by a disk. Will be erased on reboot.