batch systems

Slurm GPU Resources (Kebnekaise)

Slurm GPU Resources (Kebnekaise)

NOTE: your project need to have time on the GPU nodes to use them, as they are considered a separate resource now. To use them you use the SLURM command mentioned below. For V100s there is no specific partition you need to give, but there is for the A100s - see below.

We have two types of GPU cards available on Kebnekaise, NVIDIA Tesla V100 (Volta) and NVIDIA A100 (Ampere).

To request GPU resources one has to include a GRES in the submit file. The general format is:

Job Dependencies

Job dependencies - SLURM

A job can be given the constraint that it only starts after another job has finished.

In the following example, we have two Jobs, A and B. We want Job B to start after Job A has successfully completed.

First we start Job A by submitting it via sbatch:

$ sbatch <jobA.sh>

Making note of the assigned job ID for Job A, we then submit Job B with the added condition that it only starts after Job A has successfully completed:

Job Cancellation

Deleting a job

To cancel a job, use scancel. You need the running or pending jobid. It is only the job's owner and SLURM administrators that can cancel jobs.
$ scancel <jobid>

To cancel all your jobs (running and pending) you can run

$ scancel -u <username>

You get the job id when you submit the job.

Job Status

Job status

To see status of partitions and nodes, use

$ sinfo

To get the status of all SLURM jobs

$ squeue

To only view the jobs in the largemem partition on Kebnekaise

$ queue -p largemem

Get the status of an individual job

$ scontrol show job <jobid>

Slurm MPI + OpenMP examples

Slurm MPI + OpenMP examples

This example shows a hybrid MPI/OpenMP job with 4 tasks and 28 cores per task.

#!/bin/bash
# Example with 4 tasks and 28 cores per task for MPI+OpenMP
#
# Project/Account
#SBATCH -A hpc2n-1234-56
#
# Number of MPI tasks
#SBATCH -n 4
#
# Number of cores per task
#SBATCH -c 28
#
# Runtime of this jobs is less then 12 hours.
#SBATCH --time=12:00:00
#

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

Slurm OpenMP Examples

Slurm OpenMP Examples

This example shows a 28 core OpenMP Job (maximum size for one normal node on Kebnekaise).

#!/bin/bash
# Example with 28 cores for OpenMP
#
# Project/Account
#SBATCH -A hpc2n-1234-56
#
# Number of cores
#SBATCH -c 28
#
# Runtime of this jobs is less then 12 hours.
#SBATCH --time=12:00:00
#

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a
​
# Set OMP_NUM_THREADS to the same value as -c
# with a fallback in case it isn't set.

Slurm MPI examples

Slurm MPI examples

This example shows a job with 28 task and 14 tasks per node. This matches the normal nodes on Kebnekaise.

#!/bin/bash
# Example with 28 MPI tasks and 14 tasks per node.
#
# Project/Account (use your own)
#SBATCH -A hpc2n-1234-56
#
# Number of MPI tasks
#SBATCH -n 28
#
# Number of tasks per node
#SBATCH --tasks-per-node=14
#
# Runtime of this jobs is less then 12 hours.
#SBATCH --time=12:00:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job

Submit File Design

SLURM Submit File Design

To best use the resources with Slurm you need to have some basic information about the application you want to run.

Slurm will do its best to fit your job into the cluster, but you have to give it some hints of what you want it to do.

The parameters described below can be given directly as arguments to srun and sbatch.

If you don't give SLURM enough information, it will try to fit your job for best throughput (lowest possible queue time). This approach will not always give the best performance for your job.

Pages

Updated: 2024-03-08, 14:54