Ask users to clone the repository with the development shells instead, so we can keep the repository easily updated.
101 lines
3.1 KiB
Markdown
101 lines
3.1 KiB
Markdown
---
|
|
title: "Fox"
|
|
description: "AMD Genoa 9684X with 2 NVIDIA RTX4000 GPUs"
|
|
date: 2025-02-12
|
|
---
|
|
|
|

|
|
|
|
Picture by [Joanne Redwood](https://web.archive.org/web/20191109175146/https://www.inaturalist.org/photos/6568074),
|
|
[CC0](http://creativecommons.org/publicdomain/zero/1.0/deed.en).
|
|
|
|
The *fox* machine is a big GPU server that is configured to run heavy workloads.
|
|
It has two fast AMD CPUs with large cache and 2 reasonable NVIDIA GPUs. Here are
|
|
the detailed specifications:
|
|
|
|
- 2x AMD GENOA X 9684X DP/UP 96C/192T 2.55G 1,150M 400W SP5 3D V-cach
|
|
- 24x 32GB DDR5-4800 ECC RDIMM (total 768 GiB of RAM)
|
|
- 1x 2.5" SSD SATA3 MICRON 5400 MAX 480GB
|
|
- 2x 2.5" KIOXIA CM7-R 1.92TB NVMe GEN5 PCIe 5x4
|
|
- 2x NVIDIA RTX4000 ADA Gen 20GB GDDR6 PCIe 4.0
|
|
|
|
## Access
|
|
|
|
To access the machine, request a SLURM session from [apex](/apex) using the `fox`
|
|
partition. If you need the machine for performance measurements, use an
|
|
exclusive reservation:
|
|
|
|
apex% salloc -p fox --exclusive
|
|
|
|
Otherwise, specify the CPUs that you need so other users can also use the node
|
|
at the same time:
|
|
|
|
apex% salloc -p fox -c 8
|
|
|
|
Then use srun to execute an interactive shell:
|
|
|
|
apex% srun --pty $SHELL
|
|
fox%
|
|
|
|
Make sure you get all CPUs you expect:
|
|
|
|
fox% grep Cpus_allowed_list /proc/self/status
|
|
Cpus_allowed_list: 0-191
|
|
|
|
Follow [these steps](/access) if you don't have access to apex or fox.
|
|
|
|
## CUDA
|
|
|
|
To use CUDA you'll need to load the NVIDIA `nvcc` compiler and some additional
|
|
libraries in the environment. Clone the
|
|
[following
|
|
example](https://jungle.bsc.es/git/rarias/devshell/src/branch/main/cuda) and
|
|
modify the `flake.nix` if needed to add additional packages.
|
|
|
|
Then just run `nix develop` from the same directory to spawn a new shell with
|
|
the CUDA environment:
|
|
|
|
fox% git clone https://jungle.bsc.es/git/rarias/devshell
|
|
|
|
fox% cd devshell/cuda
|
|
|
|
fox% nix develop
|
|
|
|
fox$ nvcc -V
|
|
nvcc: NVIDIA (R) Cuda compiler driver
|
|
Copyright (c) 2005-2025 NVIDIA Corporation
|
|
Built on Fri_Feb_21_20:23:50_PST_2025
|
|
Cuda compilation tools, release 12.8, V12.8.93
|
|
Build cuda_12.8.r12.8/compiler.35583870_0
|
|
|
|
fox$ make
|
|
nvcc -ccbin g++ -m64 -Wno-deprecated-gpu-targets -o cudainfo cudainfo.cpp
|
|
|
|
fox$ ./cudainfo
|
|
./cudainfo Starting...
|
|
|
|
CUDA Device Query (Runtime API) version (CUDART static linking)
|
|
|
|
Detected 2 CUDA Capable device(s)
|
|
...
|
|
|
|
## AMD uProf
|
|
|
|
The [AMD uProf](https://www.amd.com/en/developer/uprof.html) performance
|
|
analysis tool-suite is installed and ready to use.
|
|
|
|
See the [AMD uProf user guide](https://docs.amd.com/r/en-US/57368-uProf-user-guide)
|
|
([PDF backup for v5.1](https://jungle.bsc.es/pub/57368-uprof-user-guide.pdf))
|
|
for more details on how to use the tools. To use the GUI make sure that you
|
|
connect to fox using X11 forwarding.
|
|
|
|
## Filesystems
|
|
|
|
The machine has several file systems available.
|
|
|
|
- `/nfs/home`: The `/home` from apex via NFS, which is also shared with other
|
|
xeon machines. It has about 2 ms of latency, so not suitable for quick random
|
|
access.
|
|
- `/nvme{0,1}/$USER`: The two local NVME disks, very fast and large capacity.
|
|
- `/tmp`: tmpfs, fast but not backed by a disk. Will be erased on reboot.
|