The batch system policy is fairly simple, and currently states that
If you submit a job that takes up more than your monthly allocation (remember running jobs take away from that), then your job will be pending with "Reason=AssociationResourceLimit" or "Reason=AssocMaxCpuMinutesPerJobLimit" until enough running jobs have finished. A job cannot start if it asks for more than your total monthly allocation.
You can see the current priority of your project (and that of others), by running the command sshare and look for the column marked 'Fairshare' - that shows your groups current priority.
The fairshare weight decays gradually over 50 days, meaning that jobs older than 50 days does not count towards priority.
Remember When and if a job starts depends on which resources it is requesting. If a job is asking for, say, 10 nodes and only 8 are currently available, the job will have to wait for resources to free up. Meanwhile, other jobs with lower requirements will be allowed to start as long as they do not affect the starttime of higher priority jobs.
The SLURM scheduler divides the job queue in two parts.
Basically what happens when a job is submitted is this.
When a job is submitted, the SLURM batch scheduler assigns it an initial priority. The priority value will increase while the job is waiting, until the job gets to the head of the queue. This happens as soon as the needed resources are available, provided no jobs with higher priority and matching available resources exists. When a job gets to the head of the queue, and the needed resources are available, the job will be started.
At HPC2N, SLURM assigns job priority based on the Multi-factor Job Priority scheduling. As it is currently set up, only two factors influence job priority:
Weights has been assigned to the above factors in such a way, that fair-share is the dominant factor. Partition is only a factor in the case of the 'bigmem' partition on Abisko, so that those jobs that need to run there, will have priority for running in that partition.
The following formula is used to calculate a job's priority:
Job_priority = 1000000 * (fair-share_factor) + 10000 * (partition_factor)
Priority is then calculated as a weighted sum of these. If you have not asked for the bigmem nodes, then the second half of the equation can be ignored.
The fair-share_factor is dependent on several things, mainly:
You can see the current value of your jobs fairshare factors with this command
sprio -l -u <username>
and your and your projects current fairshare value
sshare -l -u <username>
Note: that these values change over time, as you and your project members use resources, others submit jobs, and time passes.
Note: the job will NOT rise in priority just due to sitting in the queue for a long time. No priority is calculated merely due to age of the job.
For more information about how fair-share is calculated in SLURM, please see: http://slurm.schedmd.com/priority_multifactor.html
While physically a socket is 12 cores, for SLURM allocation purposes a socket is 6 cores (a NUMA node), i.e. allocation is in groups of 6 cores (one NUMA island). This also means 6 cores is the smallest unit you can allocate.
This is how your project is charged, depending on how many cores you ask for:
|What you ask for||Number of cores you get||What your project is charged|
|1 core||6 cores||6 cores|
|6 cores||6 cores||6 cores|
|7 cores||12 cores||12 cores|
|c cores||ceil(c/6) cores||ceil(c/6) cores|
If you request resources using
The allocation policy on Kebnekaise is a little different than on Abisko, mainly due to the mixture of normal CPUs, GPUs, and KNLs on Kebnekaise. Abisko only have normal CPUs. Thus, Kebnekaise's allocation policy may need a little extra explanation.
Thin (compute) nodes
The compute nodes, or "thin" nodes, are the standard nodes with 128 GB memory.
Note: As long as you ask for less than the number of cores than what there are in one node (28 cores), you will only be allocated for that exact number of cores. If you ask for more than 28 cores, you will be allocated whole nodes and accounted for that.
The largemem nodes have 3 TB memory per node.
Note: these nodes are not generally available, and requires that your projects have an allocation of these.
The LargeMem nodes can be allocated per socket or per node.
When asking for one K80 CPU accelerator card, it means you will get its 2 onboard compute engines (GK210 chips). The GPU nodes have 28 normal cores and 2 K80s (each with 2 compute engines). They are placed together as 14 cores + 1 K80 on a socket. If someone is using the GPU on a socket, then it is not possible for someone else to use the normal CPU cores of that socket at the same time.
Because of that, your project will be accounted for 14 cores + 2 compute engines if you ask for 1 K80s. Each core hour on a compute engine allocates as 10 core hours, i.e. 20 core hours for a K80, so you will be allocated for 14 + 20 core hours.
Note: that if you ask for 3 K80s you will be allocated for 4 K80s!
The KNL nodes are only allocated at a per node basis, meaning that you will get allocated (and be accounted for) whole nodes, rounded up, even if you ask for less than a full node.