Submit File Design

SLURM Submit File Design

To best use the resources with Slurm you need to have some basic information about the application you want to run.

Slurm will do its best to fit your job into the cluster, but you have to give it some hints of what you want it to do.

The parameters described below can be given directly as arguments to srun and sbatch.

If you don't give SLURM enough information, it will try to fit your job for best throughput (lowest possible queue time). This approach will not always give the best performance for your job.

To get the best performance, you will need to know the following:

Some extra parameters that might be usefull:

For basic examples for different types, see the following pages:

Some applications may have special needs, in order to get them running at full speed.
Look at the application specific pages for more information about any such special requirements.
Some commonly used programs are listed below.

First line in submit file

The submit file must start with:

#!/bin/bash

This is required for the module system to work. There are other possibilities, but this is the only one we fully support.

Your account (-A)

The account is your project id, this is mandatory.

Example:

#SBATCH -A SNIC000-00-000

You can find your project id by running:

$ projinfo

The number of tasks (-n)

The number of tasks is for most usecases the number of processes you want to start. The default value is one (1).

An example could be the number of MPI tasks or the number of serial programs you want to start.

Example:

#SBATCH -n 48

 

The number of cores per task (-c)

If your application is multi threaded (OpenMP/...) this number indicates the number of cores each task can use.
The default value is one (1).

Example:

#SBATCH -c 14

On Kebnekaise the number of cores depend on which type of nodes you are running. Generally, the nodes have 28 cores, except for the largemem nodes which have 72 cores, and the KNL nodes which have 68 cores. For more information, see the Kebnekaise hardware page.

The number of tasks per node (--ntasks-per-node)

If your application requires more than the maximum number of available cores in one node (28 for kebnekaise) it might be wise to set the number of tasks per node, depending on your job. This is the (minimum) number of tasks allocated per node.

Remember that the number of cores is the product of the number of tasks, times the number of cores per task.

There are 28 cores per node on Kebnekaise, so this is the maximum number of tasks per node for that system.

If you don't set this option, Slurm will try to spread the task(s) over as few available nodes as possible. This can result in a job with 22 tasks on one node, and 6 on another, for a 28 task job (Kebnekaise has 28 cores on a regular Broadwell node). If you let slurm spread your job it is more likely to start faster, but the performance of the job might be hurting. If you are using more than 28 cores (regular broadwell node on kebnekaise) and are unsure of how your application behaves, it is probably a good thing to put an even spread over the number of required nodes.
There is no need to tell slurm how many nodes that your job needs. It will do the math.

Example:

#SBATCH --ntasks-per-node=24

Memory usage

RAM per core
Kebnekaise (broadwell) 4450 MB
Kebnekaise (skylake) 6750 MB
Kebnekaise largemem 41666 MB

Each core has a limited amount of memory available. If your job requires more memory than the default, you can allocate more cores for your task with (-c).

If, for instance, you need 7000MB/task on a Kebnekaise broadwell node, set "-c 2".

Example:

# I need 2 x 4450MB (8900MB) of memory for my job.
#SBATCH -c 2

This will allocate two (2) cores with 4450MB each. If your code is not multi-threaded (using only one core per task) the other one will just add its memory to your job.

If your job requires more than 190000MB / node on Kebnekaise (skylake), there is a limited number of nodes with 3072000MB memory, which you may be allowed to use (you apply for it as a separate resource when you make your project proposal in SUPR). They are accessed by selecting the largemem partition of the cluster. You do this by setting: -p largemem.

Example:

#SBATCH -p largemem

The run/wallclock time (--time, --time-min)

If you know the runtime (wall clock time) of your job, it is beneficial to set this value as accurately as possible.

Smaller jobs are more likely to fit into slots of unused space faster.

Note: Please add some extra time to account for variances in the system.

The maximum allowed runtime of any job is seven (7) days.

The format is:

D-HH:MM:SS (D=Day(s), HH=Hour(s), MM=Minute(s), SS=Second(s))

Example:

# Runtime limit 2 days, 12hours
#SBATCH --time 2-12:00:00

You can also use the --time-min option to set a minimum time for your job.
If you use this, Slurm will try to find a slot with more than --time-min and less than --time. This is useful if your job does periodic checkpoints of data and can restart from that point. This technique can be used to fill openings in the system, that no big jobs can fill, and so allows for better throughput of your jobs.

Example:

# Runtime limit 2 days, 12hours
#SBATCH --time 2-12:00:00
#
# Minimum runtime limit 1 days, 12hours
#SBATCH --time-min 1-12:00:00

The number of nodes (-N)

It is possible to set the number of nodes that slurm should allocate for your job.

This should only be used together with --ntasks-per-node or with --exclusive.

But in almost every case it is better to let slurm calculate the number of nodes required for your job, from the number of tasks, the number of cores per task, and the number of tasks per node.

Sending output to files (--output/--error)

The output (stdout) and error (stderr) output from your program can be collected with the help of the --output and --error options to sbatch.

Example:

# Send stderr of my program into <jobid>.error
#SBATCH --error=%J.error

# Send stdout of my program into <jobid>.output
#SBATCH --output=%J.output

The files in the example will end up in the working directory of you job.

Send mail on job changes (--mail-type)

Slurm can send mail to you when certain event types occur.  Valid type values are: BEGIN, END, FAIL, REQUEUE, and ALL (any state change).

Example:

# Send mail when job ends
#SBATCH --mail-type=END

Note: We recommend that you do NOT include a command for the batch system to send an email when the job has finished, particularly if you are running large amounts of jobs. The reason for this is that many mail servers have a limit and may block accounts (or domains) temporarily if they send too many mails. Instead use

scontrol show job <jobid>

or

squeue -l -u <username>

to see the status of your job(s).

Exclusive (--exclusive)

In some use-cases it is usefull to ask for the complete node (not allowing any other jobs to share).

--exclusive can be used with -N (number of nodes) to get all the cores, and memory, on the node(s) exclusively for your job.

Example:

# Request complete nodes
#SBATCH --exclusive

Common programs which have special requirements

Updated: 2024-03-08, 14:54