SLURM commands and information
|
There are many more commands than the ones we have chosen to look at below, but they are the most commonly used ones. You can find more information on the SLURM homepage: SLURM documentation Just like for PBS, you can run programs either by giving all the commands on the command line or by submitting a job script. If you ask for the resources on the command line, you will wait for the program to run before you can use the window again (unless you can send it to the background with &). Using salloc, you get an interactive shell to run your jobs in, when your nodes are allocated. This works like an interactive shell (-I) in PBS - including the fact that you cannot use the window while you wait for - perhaps - a long time before the job starts. This asks to allocate 1 node and 4 processors in 1 hour and 30 minutes. When the resources are available, you will get an interactive shell with those resources salloc -N 1 -n 4 --time=1:30:00Submitting a job script avoids this. While it still may takes a long time before the job runs (depending on load on the machine and your groups priority), you can use the window in the meantime. Serial, OpenMP, MPI, hybrid jobs - all can be submitted either directly with srun, in an interactive shell started with salloc, or through a job submission file. The environment will be exported in either case, so remember to load any programs (like MPI) first, and to set OpenMP variables, if needed (Csh/Tcsh: setenv OMP_NUM_THREADS <number of threads>, Bash/ksh: export OMP_NUM_THREADS=<number of threads>. You can see what the environment variable is set to, with the command echo $OMP_NUM_THREADS.) srun - running programs/jobsThe command can start multiple tasks on multiple nodes, where each is a separate process executing the same program. By default, SLURM allocated one processor per task, but starts tasks on multiple processors as necessary. You can, however, specify these yourself, and does not have to follow the default.A few useful options to srun
A program could be run like this (asking for two nodes) srun -N 2 my_program Partitions/queuesSLURM does not use queues like PBS does, but instead it uses partitions, which more or less serves the same function.For now, there is only one partition defined, batch, which everyone runs their jobs in. Information about the SLURM nodes and partitions can be found using this command: $ sinfo Topology of the systemThe program hwloc-ls is very useful to see the topology of your allocation - which processors you got, where they are placed, how much memory, etc. This can be used to determine the best placement of tasks, or to see if you asked for what you thought you did!
$ srun -n 2 -c 2 hwloc-ls
Machine (32GB) + Socket L#0 (32GB)
NUMANode L#0 (P#6 16GB) + L3 L#0 (5118KB)
L2 L#0 (512KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#26)
L2 L#1 (512KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#27)
NUMANode L#1 (P#7 16GB) + L3 L#1 (5118KB)
L2 L#2 (512KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#30)
L2 L#3 (512KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#31)
Machine (32GB) + Socket L#0 (32GB)
NUMANode L#0 (P#6 16GB) + L3 L#0 (5118KB)
L2 L#0 (512KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#26)
L2 L#1 (512KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#27)
NUMANode L#1 (P#7 16GB) + L3 L#1 (5118KB)
L2 L#2 (512KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#30)
L2 L#3 (512KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#31)
A few useful environment variables
|



