We no longer need to use srun to enter the allocated machine. Make sure that the default allocation time is also specified. Reviewed-by: Aleix Boné <abonerib@bsc.es>
3.1 KiB
title, description, date
| title | description | date |
|---|---|---|
| Fox | AMD Genoa 9684X with 2 NVIDIA RTX4000 GPUs | 2025-02-12 |
Picture by Joanne Redwood, CC0.
The fox machine is a big GPU server that is configured to run heavy workloads. It has two fast AMD CPUs with large cache and 2 reasonable NVIDIA GPUs. Here are the detailed specifications:
- 2x AMD GENOA X 9684X DP/UP 96C/192T 2.55G 1,150M 400W SP5 3D V-cach
- 24x 32GB DDR5-4800 ECC RDIMM (total 768 GiB of RAM)
- 1x 2.5" SSD SATA3 MICRON 5400 MAX 480GB
- 2x 2.5" KIOXIA CM7-R 1.92TB NVMe GEN5 PCIe 5x4
- 2x NVIDIA RTX4000 ADA Gen 20GB GDDR6 PCIe 4.0
Access
To access the machine, request a SLURM session from apex using the fox
partition and set the time for the reservation (the default is 1 hour). If you
need the machine for performance measurements, use an exclusive reservation:
apex% salloc -p fox -t 02:00:00 --exclusive
fox%
Otherwise, specify the CPUs that you need so other users can also use the node at the same time:
apex% salloc -p fox -t 02:00:00 -c 8
fox%
Make sure you get all CPUs you expect:
fox% grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 0-191
Follow these steps if you don't have access to apex or fox.
CUDA
To use CUDA you'll need to load the NVIDIA nvcc compiler and some additional
libraries in the environment. Clone the
following
example and
modify the flake.nix if needed to add additional packages.
Then just run nix develop from the same directory to spawn a new shell with
the CUDA environment:
fox% git clone https://jungle.bsc.es/git/rarias/devshell
fox% cd devshell/cuda
fox% nix develop
fox$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
fox$ make
nvcc -ccbin g++ -m64 -Wno-deprecated-gpu-targets -o cudainfo cudainfo.cpp
fox$ ./cudainfo
./cudainfo Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
...
AMD uProf
The AMD uProf performance analysis tool-suite is installed and ready to use.
See the AMD uProf user guide (PDF backup for v5.1) for more details on how to use the tools. To use the GUI make sure that you connect to fox using X11 forwarding.
Filesystems
The machine has several file systems available.
/nfs/home: The/homefrom apex via NFS, which is also shared with other xeon machines. It has about 2 ms of latency, so not suitable for quick random access./nvme{0,1}/$USER: The two local NVME disks, very fast and large capacity./tmp: tmpfs, fast but not backed by a disk. Will be erased on reboot.
