Support &
Documentation
$ squeue -u <username>
$ srun <commands for your job/program>
scontrol show job <jobid>
or
squeue -l -u <username>
to see the status of your job(s).
#SBATCH --requeue
to your submit script.
$ sacct -l -j <jobid> -o jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize
If your job is pending and with "Reason=AssociationResourceLimit" or "Reason=AssocMaxCpuMinutesPerJobLimit", it is because your currently running jobs allocates your entire footprint allowance for your project. The job will start when enough of your running jobs have finished that you are below the limit.
Another possibility is that your job is requesting more resources (more core hours) than your allocation permits. Remember: <cores requested> x <walltime> = <core hours you are requesting>. NOTE: On Kebnekaise, if you are asking for more than 28 cores, you are accounted for a whole number of nodes, rounded up (Ex. 29 cores -> 2 nodes).
If you are running an MPI code, then you need to use
srun <flags> <program>
$ sreport cluster AccountUtilizationByUser start=MM/DD/YY end=MM/DD/YY Accounts=hpc2nXXXX-YYY
The default stack limit on Kebnekaise now unlimited by default, and you no longer need to use the flag --propagate=STACK to srun.
You access the GPU nodes on Kebnekaise with
#SBATCH --gres=gpu:k80:x
where x = 1, 2, or 4. More information on the "SLURM GPU Resources (Kebnekaise)" page.
NOTE: your project need to have time on the GPU nodes to use them, as they are considered a separate resource now. You do not have to add a specific partition in the job script though, as it is handled through the above SLURM command.
You access the KNL nodes from kebnekaise-knl.hpc2n.umu.se. You need to specify the "knl" partition, #SBATCH -p knl and Request 4 threads per core, #SBATCH --threads-per-core=4. More information on the "Using the KNL nodes" page. /resources/hardware/kebnekaise/knl
SLURM provides several environment variables which can be put to use in your submit script. For a full list of available variables, see the SBATCH man page, section titled 'OUTPUT ENVIRONMENT VARIABLES'.
This variable provides a list of nodes that are allocated to your job. The nodes are listed in a compact form, for example 't-cn[0211,0216-0217]' which specifies the nodes:
This list can be manipulated in various ways by the 'hostlist' command. Let's assume the above listed nodes in the SLURM_JOB_NODELIST variable and look at several examples:
$ hostlist -e $SLURM_JOB_NODELIST t-cn0211 t-cn0216 t-cn0217 $ hostlist -e -s',' $SLURM_JOB_NODELIST t-cn0211,t-cn0216,t-cn0217 $ hostlist -n $SLURM_JOB_NODELIST 3or hybrid code $ hostlist -e -o 1 -l 1 $SLURM_JOB_NODELIST t-cn0216
hostlist --help
If you have jobs still in the queue when your project expires, the priority of those jobs are lowered drastically. However, your job will not be removed, and you can change the project account for the job with this command
scontrol update job=<jobid> account=<newproject>