Job Submission
There are three ways to run a job with SLURM.Command lineA job can simply be submitted from the command line with srun: srun -N 2 --exclusive --time=00:30:00 my_programThis example asks for exclusive use of two nodes to run the program my_program, and a time limit of 30 minutes. Since the number of tasks has not been specified, it assumes the default of one task per node. Note that the --exclusive parameter guarantees no other jobs will run on the allocated nodes. Without the --exclusive parameter, SLURM would only allocate the minimum assignable resource for each node, which in the case of Abisko, is defined as 12 cores. When submitting the job this way, you give all the commands on the command line,and then you wait for the job to pass through the job queue, run, and complete before the shell prompt returns, allowing you to continue typing commands. This is a good way to run quick jobs and get accustomed to how SLURM works, but it is not the recommended way of running longer programs, or MPI programs; these types of jobs should run as a batch job with a Job Submission File. Job Submission FileInstead of submitting the program directly to SLURM with srun from the command line, you can submit a batch job with sbatch. This has the advantage of not waiting for the job to start before you can use your shell prompt. Before submitting a batch job, you first write a job submission file, which is an executable shell script. It contains all the environment setup, commands and arguments to run your job (other programs, MPI applications, srun commands, shell commands, etc). When your job submission file is ready, you submit it to the job queue with sbatch. sbatch will add your job to the queue, returning immediately so you can continue to use your shell prompt. The job will run when resources become available. When the job is complete, you will get a file named slurm-<jobid>.out containing the output from your job. This file will be placed in the same directory that you submitted your job from. Here is an example use of sbatch: sbatch jobXsubmit.sh InteractiveIf you would like to allocate resources on the cluster and then have the flexibility of using those resources in an interactive manner, you can use the command salloc to allow interactive use of resources allocated to your job. This is can be useful for debugging, in addition to debugging tools like DDT. First, you make a request for resources with salloc: salloc -N 1 -n 4 --time=1:30:00The example above will allocate resources for up to 4 simultaneous tasks on 1 node for 1 hour and 30 minutes. Your request enters the job queue just like any other job, and salloc will tell you that it is waiting for the requested resources. When salloc tells you that your job has been allocated resources, you can interactively run programs on those resources with srun. The commands you run with srun will then be executed on the resources your job has been allocated. NOTE: After salloc tells you that your job resources have been granted, you are still using a shell on the login node. You must submit all commands with srun to have them run on your job's allocated resources. Commands run without srun will be executed on the login node. This is demonstrated in Example 1. Example 1 - 1 node, resources for 4 parallel taskst-mn01 [~/pfs]$ salloc -N 1 -n 4 --time=1:00:00 salloc: Pending job allocation 42973 salloc: job 42973 queued and waiting for resources salloc: job 42973 has been allocated resources salloc: Granted job allocation 42973 t-mn01 [~/pfs]$ echo $SLURM_NODELIST t-cn0122 t-mn01 [~/pfs]$ srun hostname t-cn0122.hpc2n.umu.se t-cn0122.hpc2n.umu.se t-cn0122.hpc2n.umu.se t-cn0122.hpc2n.umu.se t-mn01 [~/pfs]$ hostname t-mn01.hpc2n.umu.se Example 2 - 2 nodes, resources for 4 parallel taskst-mn01 [~/pfs]$ salloc -N 2 -n 4 --time=1:00:00 salloc: Pending job allocation 42974 salloc: job 42974 queued and waiting for resources salloc: job 42974 has been allocated resources salloc: Granted job allocation 42974 t-mn01 [~/pfs]$ echo $SLURM_NODELIST t-cn[0122,0130] t-mn01 [~/pfs]$ srun hostname t-cn0122.hpc2n.umu.se t-cn0122.hpc2n.umu.se t-cn0122.hpc2n.umu.se t-cn0130.hpc2n.umu.se t-mn01 [~/pfs]$ Note that SLURM determined where to allocate resources for the 4 tasks on the 2 nodes. In this case, three tasks were run on t-cn0122, and one on t-cn0130. If needed, you can control how many tasks you want to run on each node with --ntask-per-node=<number>. |



