Slurm GPU Resources (Kebnekaise)

Slurm GPU Resources (Kebnekaise)

NOTE: your project need to have time on the GPU nodes to use them, as they are considered a separate resource now. To use them you use the SLURM command mentioned below. For V100s there is no specific partition you need to give, but there is for the A100s - see below.

We have two types of GPU cards available on Kebnekaise, NVIDIA Tesla V100 (Volta) and NVIDIA A100 (Ampere).

To request GPU resources one has to include a GRES in the submit file. The general format is:

#SBATCH --gres=gpu:<type-of-card>:x

where <type-of-card> is either v100 or a100 and x = 1, 2.

The V100 enabled nodes contain two V100 cards each.

The A100 enabled nodes contain two A100 cards each.

On the dual card nodes one can request either a single card (x = 1) or both (x = 2). For each requested card, a whole CPU socket (14 cores for K80 and V100, 24 cores for A100) is also dedicated on the same node. Each card is connected to the PCI-express bus of the corresponding CPU socket.

One can activate Nvidia Multi Process Service (MPS), if so required, by using:

#SBATCH --gres=gpu:v100:x,nvidiamps

If the code that is going to run on the allocated resources expects the gpus to be in exclusive mode (default is shared), this can be selected with "gpuexcl", like this:

#SBATCH --gres=gpu:v100:x,gpuexcl

NOTE: for the A100s, you also need to add the partition:

#SBATCH -p amd_gpu
Updated: 2024-04-17, 14:47