Tips, SLURM

Tips, SLURM

Job status

  • In order to see your jobs only, and no others, run
    $ squeue -u <username>
    

Interactive running

  • Using salloc, you get an interactive shell to run your jobs in, when your nodes are allocated. This works like an interactive shell (-I) does in PBS - including the fact that you cannot use the window while you wait for - perhaps - a long time before the job starts.
  • You MUST use srun to run anything on the nodes you allocate with salloc. The shell you are standing in is still on the login node, and if you do not use srun, you just run on that node. This is potentially very disruptive since the login node can get slowed down a lot, for everyone. Thus, you should always use srun! Run with
    $ srun <commands for your job/program>
    

General hints

  • Remember, in SLURM, your batch job starts to run in the directory from which you submitted the script. This means you do NOT have to change to that directory like you do in PBS systems.
  • Per default, SLURM may place other tasks - both your own and others - on the node(s) you are using. It is possible to ask for the entire node, and since SLURM does not separate between your own jobs and the jobs of others, this means the node will also not be shared between your own tasks. This is useful if you, say, need the whole infiniband bandwidth, or all the memory on the node. However, remember that if you allocate the entire node for yourself, even if you only run on one or two cores, then you will still be 'charged' for a whole node from your project allocation, so only do this if you actually need it!
  • We strongly recommend that you do NOT include a command for the batch system to send an email when the job has finished, particularly if you are running large amounts of jobs. The reason for this is that many mail servers have a limit and may block accounts (or domains) temporarily if they send too many mails. Instead use
    scontrol show job <jobid>

    or

    squeue -l -u <username>

    to see the status of your job(s).

  • In some situations, a job may die unexpectedly, for instance if a node crashes. At HPC2N SLURM has been configured NOT to requeue and restart jobs automatically. If you do want your job to requeue, add the command
    #SBATCH --requeue

    to your submit script.

  • The command sacctmgr can, with the right flags, give a lot of useful information.
  • sacct can be used to get info on use of memory and other resources for a jobs
    $ sacct -l -j <jobid> -o jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize
  • The smallest allocatable unit on Kebnekaise is a single core.
  • You must give a project account number for the job to be accepted by the job scheduler. There is no default partition.
  • If you see your job is in the state ReqNodeNotAvail, it is usually because there is a maintenance window scheduled and your job would overlap that period. Check the System News to see if there is a maintenance window scheduled! As soon as the service is done, the reservation is released and the job should start as normal.
  • If your job is pending and with "Reason=AssociationResourceLimit" or "Reason=AssocMaxCpuMinutesPerJobLimit", it is because your currently running jobs allocates your entire footprint allowance for your project. The job will start when enough of your running jobs have finished that you are below the limit.

    Another possibility is that your job is requesting more resources (more core hours) than your allocation permits. Remember: <cores requested> x <walltime> = <core hours you are requesting>. NOTE: On Kebnekaise, if you are asking for more than 28 cores, you are accounted for a whole number of nodes, rounded up (Ex. 29 cores -> 2 nodes).

  • If you are running an MPI code, then you need to use

    srun <flags> <program>
    

     

  • sreport is useful for getting information about many things, for instance the usage of users in a project. The example below gives usage per user, for a period given with 'start' and 'end', for the project with account number hpc2nXXXX-YYY (accounts can be of the form hpc2nXXXX-YYY, snicXXX-YY-ZZ, naissXXXX-YY-ZZ). Note: the letters in the account number must be given in lower case!
    $ sreport cluster AccountUtilizationByUser start=MM/DD/YY end=MM/DD/YY Accounts=hpc2nXXXX-YYY
  • The default stack limit on Kebnekaise now unlimited by default, and you no longer need to use the flag --propagate=STACK to srun.

  • How you access the GPU nodes on Kebnekaise depends on which types you want to use:

    • K80

      #SBATCH --gres=gpu:k80:x

      where x = 1, 2 or 4. More information on the "SLURM GPU Resources (Kebnekaise)" page.

    • V100

      #SBATCH --gres=gpu:v100:x

      where x = 1 or 2. More information on the "SLURM GPU Resources (Kebnekaise)" page.

    • A100

      #SBATCH -p amd_gpu
      #SBATCH --gres=gpu:a100:x

      where x = 1 or 2. More information on the "SLURM GPU Resources (Kebnekaise)" page.

    • NOTE: regardless of which type of GPUs you want to use, your project need to have time on the GPU nodes to use them, as they are considered a separate resource now. You do not have to add a specific partition in the job script though, as it is handled through the above SLURM command.

Environment variables

SLURM provides several environment variables which can be put to use in your submit script. For a full list of available variables, see the SBATCH man page, section titled 'OUTPUT ENVIRONMENT VARIABLES'.

SLURM_JOB_NODELIST

This variable provides a list of nodes that are allocated to your job. The nodes are listed in a compact form, for example 'b-cn[0211,0216-0217]' which specifies the nodes:

  • b-cn0211
  • b-cn0216
  • b-cn0217

This list can be manipulated in various ways by the 'hostlist' command. Let's assume the above listed nodes in the SLURM_JOB_NODELIST variable and look at several examples:

$ hostlist -e $SLURM_JOB_NODELIST
b-cn0211
b-cn0216
b-cn0217

$ hostlist -e -s',' $SLURM_JOB_NODELIST
b-cn0211,b-cn0216,b-cn0217

$ hostlist -n $SLURM_JOB_NODELIST
3

$ hostlist -e -o 1 -l 1 $SLURM_JOB_NODELIST
b-cn0216
For a full list of hostlist options, type:
hostlist --help

Expiring projects

If you have jobs still in the queue when your project expires, the priority of those jobs are lowered drastically. However, your job will not be removed, and you can change the project account for the job with this command

scontrol update job=<jobid> account=<newproject> 
Updated: 2024-03-19, 10:33