Batch systems

The Batch system

Once a parallel program has been successfully compiled it can be run on multi-processor/multi-core computing nodes directly or, in production environment, by means of a batch system. Batch systems keeps track of available system resources and takes care of scheduling jobs of multiple users running their tasks simultaneously. It typically organizes submitted jobs into some sort of prioritized queue. The batch system is also used to enforce local system resource usage and job scheduling policies.

HPC2N currently has two clusters which accepts local batch jobs; Abisko and Kebnekaise. Both are running SLURM. It is an Open Source job scheduler, which provides three key functions.

  • First, it allocates to users, exclusive or non-exclusive access to resources for some period of time.
  • Second, it provides a framework for starting, executing, and monitoring work on a set of allocated nodes (the cluster).
  • Third, it manages a queue of pending jobs, in order to distribute work across resources according to policies. 

SLURM is designed to handle thousands of nodes in a single cluster, and can sustain throughput of 120,000 jobs per hour.

Overview of the Batch system subpages:

Updated: 2017-12-14, 12:27