Basic submit script examples

Basic examples of job submission files

'Job submission file' is the official SLURM name for the file you use to submit your program and ask for resources from the job scheduler. Here we will be using it interchangeably with 'script' or 'batch script'.

Commands to the batch scheduler is prefaced with #SBATCH, these are also called directives. You can also add normal shell commands to the script.

All SLURM directives can be given on the command line instead of in the script.

Remember - the scripts and all programs called by them, must be executable!

The examples below assume you are submitting the job from the same directory your program is located in - otherwise you need to give the full path.

A basic example.

Asking for 4 tasks, running for no longer than 30 minutes in the account <account>. Running the MPI program "my_mpi_program".

#!/bin/bash
#SBATCH -A <account>
#SBATCH -n 4
#SBATCH --time=00:30:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# And finally run the job
srun ./my_mpi_program

Submit the job with

sbatch <my_mpi_jobscript>

assuming the above script is named my_mpi_jobscript.

Running two executables per node (two serial jobs).

The scripts job1 and job2 could be any script or executable that is a serial code. The drawback with this examnple is that any output from job1 or job2 will get mixed up in the batch jobs output file.

You then submit them both with a script like this.

#!/bin/bash
#SBATCH -A <account>
#SBATCH -n 2
#SBATCH --time=00:30:00 

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# Use '&' to start the first job in the background
srun -n 1 ./job1 &
srun -n 1 ./job2 

# Use 'wait' as a barrier to collect both executables when they are done. If not the batch job will finish when the job2.batch program finishes and kill job1.batch if it is still running.
wait

Naming output/error files

Normally, SLURM produces one output file called slurm-<jobid>.out containing the combined standard output and errors from the run (though files created by the program itself will of course also be created). If you wish to rename the output and error files, and get them in separate files, you can do something similar to this:

#!/bin/bash
#SBATCH -A <account> 
#SBATCH -n 2
#SBATCH --time=00:05:00 
#SBATCH --error=job.%J.err 
#SBATCH --output=job.%J.out

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# And finally run the job
srun ./my_mpi_program

Running a MPI job

2 nodes, 56 cores, 1 hour, memory per task=4000 MB. The example below is for Kebnekaise, which has 28 cores per node. Change the number of nodes accordingly for the cluster you are submitting jobs to.

#!/bin/bash
#SBATCH -A <account>
#SBATCH -N 2
# use --exclusive to get the whole nodes exclusively for this job
#SBATCH --exclusive
#SBATCH --time=01:00:00
# This job needs 8GB of memory per mpi-task (=mpi ranks, =cores)
# and since the amount of memory on the nodes is 4500MB per core
# when using all 28 cores we have to use 2 nodes and only half
# the cores
#SBATCH -c 2

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# And finally run the job

srun ./mpi_memjob

Notes

Load any needed modules in the script.

An MPI job which reports various useful information as well

Kebnekaise example, 2 nodes, 14 tasks spread evenly over these two nodes.

#!/bin/bash
#SBATCH -A <account>
#SBATCH -n 14
# Spread the tasks evenly among the nodes
#SBATCH --ntasks-per-node=7
#SBATCH --time=35:15:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Current working directory is `pwd`"

srun ./mpi_memjob
echo "Program finished with exit code $? at: `date`"

Running fewer MPI tasks than the cores you have available

#!/bin/bash 
# Account name to run under 
#SBATCH -A <account>
# a sensible name for the job
#SBATCH -J my_job_name
# ask for 4 full nodes
#SBATCH -N 4
#SBATCH --exclusive       
# ask for 1 day and 3 hours of run time
#SBATCH -t 1-03:00:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# run only 1 MPI task/process on a node, regardless of how many cores the nodes have.
srun -n 4 --ntasks-per-node=1 ./my_mpi_program

Multiple Parallel Jobs Sequentially

Here we run several jobs after each other. The example is for Kebnekaise. Of course, a similar example will work for serial jobs. Just remove the srun from the command.

#!/bin/bash
#SBATCH -A <account>
# Asking for one hour. Adjust accordingly. Remember, the time must be long enough
# for all of the sub-jobs to complete
#SBATCH -t 01:00:00
# Ask for the number of cores the job needing most cores will use. It is better to pick jobs that run on about the 
# same number of cores, so as not to waste cores by them doing nothing during the jobs that might need fewer 
#SBATCH -n 14
#SBATCH -c 2

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# You can also do other things, like copying files and such
# I also throw the output (and errors) to the file myoutput1, myoutput2 and myoutput3. 
# If your job only write output to a file on its own and not to standard out, there is likely no reason to do this. 
# I then copy the output somewhere else and then run another executable and another copy ...
srun ./a.out > myoutput1 2>&1
cp myoutput1 /proj/nobackup/snicXXXX-YY-ZZ/mydatadir
srun ./b.out > myoutput2 2>&1
cp myoutput2 /proj/nobackup/snicXXXX-YY-ZZ/mydatadir
srun ./c.out > myoutput3 2>&1
cp myoutput3 /proj/nobackup/snicXXXX-YY-ZZ/mydatadir

Multiple Parallel Jobs Simultaneously

Here we run several jobs at the same time. Make sure you ask for enough cores that all jobs can run at the same time, and have enough memory. Of course, this will also work for serial jobs - just remove the srun from the command line.

Notice the "&" at the end of each srun command. Also note the "wait" command at the end of the batch script. It is important as it makes sure the batch job wait until all the simultaneous sruns are completed.

#!/bin/bash
#SBATCH -A <account>
# Asking for one hour. Adjust accordingly. Remember, the time must be long enough
# for all of the jobs to complete, even the longest 
#SBATCH -t 02:00:00
# Ask for the total number of cores the jobs need
#SBATCH -n 56 

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job
module load foss/2019a

# And finally run the jobs
srun -n 8 --cpu_bind=cores ./a.out > a.out 2>&1 &
srun -n 38 --cpu_bind=cores ./b.out > b.out 2>&1 &
srun -n 10 --cpu_bind=cores ./c.out > c.out 2>&1 & 

wait

Tags:

documentation