SLURM commands and information

There are many more commands than the ones we have chosen to look at below, but they are the most commonly used ones. You can find more information on the SLURM homepage: SLURM documentation

You can run programs either by giving all the commands on the command line or by submitting a job script. If you ask for the resources on the command line, you will wait for the program to run before you can use the window again (unless you can send it to the background with &).

Using salloc, you get an interactive shell to run your jobs in, when your nodes are allocated. Note that you cannot use the window while you wait for - perhaps - a long time before the job starts.

This asks to allocate 1 node and 4 processors for 1 hour and 30 minutes. When the resources are available, you will get an interactive shell with those resources

$ salloc -A <your project> -N 1 -n 4 --time=1:30:00

Submitting a job script avoids this. While it still may take a long time before the job runs (depending on load on the machine and your projects priority), you can use the window in the meantime.

To run in the allocation, created by salloc, you must use srun to start your programs or you will run on the login node.

$ srun -n 2 my_program

Serial, OpenMP, MPI, hybrid jobs - all can be submitted either directly with srun, in an interactive shell started with salloc, or through a job submission file. The environment will be exported in either case, so remember to load any programs (like MPI) first, and to set OpenMP variables, if needed

export OMP_NUM_THREADS=<number of threads>.

You can see what the environment variable is set to, with the command echo $OMP_NUM_THREADS.

`srun/sbatch` - running programs/jobs

The commands can be used to start multiple tasks on multiple nodes, where each is a separate process executing the same program. By default, SLURM allocates one processor per task, but starts tasks on multiple processors as necessary. You can, however, specify these yourself, and does not have to follow the default.

srun waits until there are resources available and the jobs have completed. You must use srun to start jobs within a salloc session.

$ srun -n 2 my_program

sbatch submits your submit file to the batch system and returns directly.

More information about parameters and job submission files can be found on the page: Slurm submit file design.

Partitions

SLURM uses partitions, which more or less serves the same function as queues in other batch systems.

Kebnekaise has two partitions:

The batch partition is the default and comprises all nodes with 128GB RAM. The default amount of memory allocated for jobs in this partition is 4500MB per core.

The largemem partition consists of only the nodes with 3TB RAM. 41666MB per core is the default allocation for jobs in this partition. Note 1, to use this partition your projects needs to have an explicit allocation on the large memory nodes, you can check if this is the case or not in SUPR. Note 2, that there are only 20 nodes in this partition, so use it only when necessary.

Information about the SLURM nodes and partitions can be found using this command:

$ sinfo

Topology of the system

The program hwloc-ls is very useful to see the topology of your allocation - which processors you got, where they are placed, how much memory, etc. This can be used to determine the best placement of tasks, or to see if you asked for what you thought you did!

Running hwloc-ls for n tasks on c cpus per task (2 tasks and 2 cpus per task here) on Kebnekaise (regular broadwell node) yields:

$ srun -n 2 -c 2 hwloc-ls
Machine (125GB total)
  NUMANode L#0 (P#0 62GB)
  NUMANode L#1 (P#1 63GB) + Package L#0 + L3 L#0 (35MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#21)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#22)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#25)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#26)
  HostBridge L#0
    PCIBridge
      PCI 14e4:168e
        Net L#0 "ens6f0"
      PCI 14e4:168e
        Net L#1 "ens6f1"
    PCIBridge
      PCI 15b3:1011
        Net L#2 "ib0"
        OpenFabrics L#3 "mlx5_0"
    PCIBridge
      PCI 14e4:1665
        Net L#4 "eno1"
      PCI 14e4:1665
        Net L#5 "eno2"
    PCIBridge
      PCIBridge
        PCIBridge
          PCIBridge
            PCI 102b:0534
    PCI 8086:8d02
      Block(Disk) L#6 "sda"
Machine (125GB total)
  NUMANode L#0 (P#0 62GB)
  NUMANode L#1 (P#1 63GB) + Package L#0 + L3 L#0 (35MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#21)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#22)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#25)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#26)
  HostBridge L#0
    PCIBridge
      PCI 14e4:168e
        Net L#0 "ens6f0"
      PCI 14e4:168e
        Net L#1 "ens6f1"
    PCIBridge
      PCI 15b3:1011
        Net L#2 "ib0"
        OpenFabrics L#3 "mlx5_0"
    PCIBridge
      PCI 14e4:1665
        Net L#4 "eno1"
      PCI 14e4:1665
        Net L#5 "eno2"
    PCIBridge
      PCIBridge
        PCIBridge
          PCIBridge
            PCI 102b:0534
    PCI 8086:8d02
      Block(Disk) L#6 "sda"

A few useful environment variables

SLURM_JOB_NUM_NODES: the number of nodes you got. Can be usefull for checking if it coincides with what you had expected. Can be used both in an interactive subshell and in a script.
SLURM_NTASKS: contains the number of task slots allocated.
SLURM_JOB_ID: contains the id of the current job. This is only available in the interactive shell or during the run. You can use it in your batch script.

Tags:

documentation