batch systems

Examples, scripts

Example job submission files

'Job submission file' is the official SLURM name for the file you use to submit your program and ask for resources from the job scheduler. Here we will be using it interchangeably with 'script' or 'batch script'.

Commands to the batch scheduler is prefaced with #SBATCH, these are also called directives. You can also add normal shell commands to the script.

All SLURM directives can be given on the command line instead of in the script.

Remember - the scripts and all programs called by them, must be executable!

Examples, srun

Examples, srun

You can submit programs directly to the batch system with 'srun', giving all the command on the command line, though you should normally use a jobscript and submit it with 'sbatch', for larger or more complicated jobs.

More information about parameters and job submission files can be found on the Slurm submit file design page.

Running two tasks, each on a different core.

$ srun -A <account> -n 2 my_program

Run 6 tasks distributed across 2 nodes.

Job Dependencies

Job dependencies - SLURM

A job can be given the constraint that it only starts after another job has finished.

In the following example, we have two Jobs, A and B. We want Job B to start after Job A has successfully completed.

First we start Job A by submitting it via sbatch:

$ sbatch <jobA.sh>

Making note of the assigned job ID for Job A, we then submit Job B with the added condition that it only starts after Job A has successfully completed:

Job Cancellation

Deleting a job

To cancel a job, use scancel. You need the running or pending jobid. It is only the job's owner and SLURM administrators that can cancel jobs.
$ scancel <jobid>

To cancel all your jobs (running and pending) you can run

$ scancel -u <username>

You get the job id when you submit the job.

$ sbatch -N 1 -n 4 submitfile
Submitted batch job 173079
$ scancel 173079

Or through squeue

Job Status

Job status

To see status of partitions and nodes, use

$ sinfo

To get the status of all SLURM jobs

$ squeue

To only view the jobs in the bigmem partition (Abisko)

$ squeue -p bigmem

To only view the jobs in the largemem partition (Kebnekaise)

$ queue -p largemem

Get the status of an individual job

$ scontrol show job <jobid>

Slurm MPI + OpenMP examples

Slurm MPI + OpenMP examples

This example shows a hybrid MPI/OpenMP job with 4 tasks and 48 cores per task, on Abisko.

#!/bin/bash
# Example with 4 tasks and 48 cores per task for MPI+OpenMP
#
# Project/Account
#SBATCH -A hpc2n-1234-56
#
# Number of MPI tasks
#SBATCH -n 4
#
# Number of cores per task
#SBATCH -c 48
#
# Runtime of this jobs is less then 12 hours.
#SBATCH --time=12:00:00
#

# Set OMP_NUM_THREADS to the same value as -c
# with a fallback in case it isn't set.
# SLURM_CPUS_PER_TASK is set to the value of -c, but only if -c is explicitly set

Slurm OpenMP Examples

Slurm OpenMP Examples

This example shows a 48 core OpenMP Job (maximum size for one node on Abisko).

    #!/bin/bash
    # Example with 48 cores for OpenMP
    #
    # Project/Account
    #SBATCH -A hpc2n-1234-56
    #
    # Number of cores per task
    #SBATCH -c 48
    #
    # Runtime of this jobs is less then 12 hours.
    #SBATCH --time=12:00:00
    #

    # Set OMP_NUM_THREADS to the same value as -c
    # with a fallback in case it isn't set.
    # SLURM_CPUS_PER_TASK is set to the value of -c, but only if -c is explicitly set

Slurm MPI examples

Slurm MPI examples

This example shows a job with 48 task and 24 tasks per node. This is on Abisko.

#!/bin/bash
# Example with 48 MPI tasks and 24 tasks per node.
#
# Project/Account (use your own)
#SBATCH -A hpc2n-1234-56
#
# Number of MPI tasks
#SBATCH -n 48
#
# Number of tasks per node
#SBATCH --tasks-per-node=24
#
# Runtime of this jobs is less then 12 hours.
#SBATCH --time=12:00:00

module load openmpi/gcc

srun ./mpi_program

# End of submit file

This will create a job with two nodes with 24 tasks per node.

Submit File Design

SLURM Submit File Design

To best use the resources with Slurm you need to have some basic information about the application you want to run.

Slurm will do its best to fit your job into the cluster, but you have to give it some hints of what you want it to do.

The parameters described below can be given directly as arguments to srun and sbatch.

If you don't give SLURM enough information, it will try to fit your job for best throughput (lowest possible queue time). This approach will not always give the best performance for your job.

Pages

Updated: 2017-12-06, 15:21