HPC2N - Support - Access: Short quick-start guide - Akka

Short Quick-start Guide - Akka

 


Software
 

Compiling

Batch system

This is a quickstart guide to using the compute clusters on HPC2N. In the text below, the 'Akka' cluster is used as an example.

Click here for a step-by-step guide to running your first job.

Logging on to 'Akka'

Follow the instructions below to log on to 'Akka'.

If this is the first time you are using any of the HPC2N facilities, please change your password after you have logged in. See Accessing the systems at HPC2N for more information.

Access to 'Akka' is possible by either using SSH, by using Kerberos telnet, or with GSIAPI. The example below uses SSH, for information on Kerberos telnet and GSIAPI, see the section about Login/password.

  • enter: ssh yourusername@akka.hpc2n.umu.se
  • give password

Choosing where to store your files

Storing your files in the right location is essential when working with the batch system. See the section about the different filessystems at HPC2N for more information.

If unsure, start by using the GPFS 'parallel' file system.

Submit a job to the batch system

A computing task submitted to a batch system is called a job. Job can be submitted in two ways: a) from a command line or b) using a job script. We recommend using a job script as it makes troubleshooting easier and also allows you to keep track of batch system parameters you used in the past.

To create a new job script (also called a submit script or a submit file) you need to:

  • open a new file in one of our text editors (nano, emacs or vim);
  • write a job script including batch system directives;
  • remember to add any 'module load' that are needed - the environment of your shell is not transferred unless you ask for it directly
  • save the file and submit it to the batch system queue using command qsub.

There are several examples and more information about using the batch system and writing scripts in the subsection for the batch system.

Here we will demonstrate the usage of the batch system directives on the following simple example of a submit file for the

pingpong

#!/bin/bash
###PBS -A SNICXXX-YY-ZZ
#PBS -N Parallel
#PBS -o test.out
#PBS -e test.err
#PBS -m ae
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:30:00
###PBS -l pmem=2200mb
###PBS -l pvmem=2900mb

cd $PBS_O_WORKDIR
module add openmpi/psc

# pingpong only works on two nodes or cpus,
# -pernode makes sure that we only run one pingpong 
# per node

mpiexec -pernode ./pingpong

Batch system directives start with #PBS. The first line says that Linux shell bash will be used to interpret the job script.

  • If applicable, -A specifies the SNAC project ID formated as SNICXXX-YY-ZZ (if omitted it defaults to no project; spaces and slashes are not allowed in the project ID). Note that in the above example the line is commented in using two hash-marks; 
  • -N is a job name (default is the name of the submit file); 
  • -o and -e specify paths to the standard output and standard error files (defaults to jobname.[eo]jobnumber); 
  • -m requests sending of an e-mail on aborted job (a), job beginning (b), job end (e) (defaults to a); 
  • -l specifies requested resources (a resource list) which can be:
    • nodes specifies the number of requested nodes (default is 1); ppn (processors per node) specifies how many processors you want to reserve on each node (default is 1) (Note 1: to reserve the full node use ppn=8 on Akka; Note 2: when nodes are specified without ppn the meaning changes to the number of processors instead); 
    • walltime is the real time (as opposed to the processor time) that should be reserved for the job; 
    • pmem is a limit (enforced by batch system scheduler) defining the maximum amount of physical memory a process can use. The batch system will allocate such a job to a combination of nodes where each node has at least pmem * requested-number-of-processors-per-node (default is 1900mb on Akka) (Note: if you specify a combination (mem/procs) that requires more memory per processor then the nodes have, the job will not start.); 
    • pvmem is a limit (enforced by operating system) defining the maximum amount of virtual memory (physical RAM + swap) a process can use. The batch system will allocate such a job to nodes with at least pvmem of virtual memory available per requested processor (default is 2000mb on Akka). Usually pvmem is set somewhat higher then pmem (Note: If you specify more memory per processor then the nodes have the job will not start.).
  • By default your job's working directory is your home directory. To change that, use batch system variable $PBS_O_WORKDIR which contains absolute path of the current directory in which qsub command was run. For other job environment variables please refer to the qsub manual page
  • Before running the program it is necessary to load the appropriate module for the MPI to have access to relevant parallel libraries (see our modules page); 
  • Run your parallel program using mpiexec. Depending on how your program works you may want to reserve a full node but only run one process on it (i.e. on one processor). In that case use -pernode parameter to mpiexec. This operation is typical for a multi-threaded applications where a single process utilizes multiple execution threads which can make use of more then one processor on a node. In the above pingpong example we use -pernode to ensure that we only run one process per node.

Batch system commands

There is a set of batch system commands available to users for managing their jobs. The following is a list of commands useful to end-users: 

  • qsub <submit_file> submits a job to the batch system (if there were no syntax errors in the submit file the job is processed and inserted into the job queue, the integer job ID is printed on the screen); 
  • showq shows the current job queue (grouped by running, idle and blocked jobs); 
  • checkjob <jobid> shows detailed information about a specific job; 
  • qdel <jobid> deletes a job from the queue.

Additional information

More information on batch systems can be found on the Internet. We recommend visiting the following pages (keep in mind that some information may not apply to the HPC2N environment):

  • Batch system and batch jobs: University of Alberta webpage .
  • Multiple Threaded Applications: University of Birmingham webpage (see page bottom). 

Follow the instructions below to compile a parallel program

  • Use your own code or download a small example that sends messages between two nodes using MPI (uses standard output).
  • Download a C Makefile: Makefile
  • Download a Fortran Makefile: Makefile
  • Edit the make-file
    • Specify the files you want to compile in the makefile
      (change from pingpong to the real name)
  • To setup environment for work with MPI compilers you need to add appropritate module. For example, to enable PathScale(TM) OpenMPI compilers enter:

    $ module add openmpi/psc

    For availability of other compilers write:

    $ module avail

  • Enter the following to make an executable from C or FORTRAN source code:

    $ make -f Makefile