HPC2N - Support - The Batch system: Scripts

Scripts

 

A job submission script can contain any of the commands that you would otherwise issue yourself from the command line. It is, for example possible to both compile and run a program and also to set any necessary environment values. The results from compiling or running your programs can usually be seen after the job has completed. They will in general be created in the directory you are running in, and be named <script_name>.e<job number> (containing any errors) and <script_name>.o<job number> (any output produced to screen), unless you have given them other names in your script. Of course, the your program was going to write results to a file this will still happen. The job number is a number given to every job by PBS. You will see this reported when the job has been submitted.

Note that it can take a long time before a job finishes running. The time depends, among other things, on the number of nodes and other resources requested, the size of the program, and how many other people that are using the system at the same time.

A job submission script can be very simple, with most of the job attributes speciified on the command line, or it may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options.

Naming: You can name your script anything, including the suffix. It does not matter. Just name it something that makes sense to you and helps you remember what the script is for.

Note that you have to always include #!/bin/bash at the beginning of the script, since bash is the only supported shell. Some things may work under other shells, but not everything.

The environment variable PBS_WORKDIR contains the directory you submitted your job script from.

Example (akka):

#!/bin/bash
#PBS -A SNICXXX-YY-ZZ
#PBS -N Parallel
# name of the output file
#PBS -o test.out
# name of the error file
#PBS -e test.err
# when to send email
#PBS -m ae
# Email address to use, if not using the one 
# for the submitter. 
#PBS -M username@domain.name
# asking for 2 nodes, 2 processors
#PBS -l nodes=2:ppn=2
# the job can use up to 30 minutes to run
#PBS -l walltime=00:30:00
# memory requirements, physical memory (pmem) 
# at least 2200 mb and virtual + physical memory 
# (pvmem) at least 2900 mb 
#PBS -l pmem=2200mb
#PBS -l pvmem=2900mb

# change to the directory the job was submitted 
# from and load the module for PathScale compilers 
# with OpenMPI 
cd $PBS_O_WORKDIR
module add openmpi/psc

# the program pingpong only works on two nodes 
# or cpus, -pernode makes sure that we only run 
# one pingpong per node

mpiexec -pernode ./pingpong

One (or more) # in front of a text line means it is a comment. #PBS is used to signify a PBS directive. In order to comment out these, you need to put one more  # in front of #PBS. #PBS -A <project number> is used to tell PBS that the running time should be taken from that projects allocation. The -N Job-name, if given, replaces script_name of the error and output files.

Note:
It is important to use capital letters for #PBS. Otherwise the line will be taken for a comment and will be ignored.

The first line in the script above says that Linux shell bash will be used to interpret the job script.

  • -A If applicable, specifies the SNAC project ID formated as SNICXXX-YY-ZZ (if omitted it defaults to no project; spaces and slashes are not allowed in the project ID). Note that in the first example the line is commented out by using two hash-marks;
  • -N is a job name (default is the name of the submit file);
  • -o and -e specify paths to the standard output and standard error files (defaults to jobname.[eo]jobnumber);
  • -m requests sending of an e-mail on aborted job (a), job beginning (b), job end (e) (defaults to a);
  • -l specifies requested resources (a resource list) which can be:
    • nodes specifies the number of requested nodes (default is 1); ppn (processors per node) specifies how many processors you want to reserve on each node (default is 1) (Note 1: to reserve the full node use ppn=8 on Akka; Note 2: when nodes are specified without ppn the meaning changes to the number of processors instead);
    • walltime is the real time (as opposed to the processor time) that should be reserved for the job;
    • pmem is a limit (enforced by batch system scheduler) defining the maximum amount of physical memory a process can use. The batch system will allocate such a job to a combination of nodes where each node has at least pmem * requested-number-of-processors-per-node (default is 1900mb on Akka) (Note: if you specify a combination (mem/procs) that requires more memory per processor then the nodes have, the job will not start.);
    • pvmem is a limit (enforced by operating system) defining the maximum amount of virtual memory (physical RAM + swap) a process can use. The batch system will allocate such a job to nodes with at least pvmem of virtual memory available per requested processor (default is 2000mb on Akka). Usually pvmem is set somewhat higher then pmem (Note: If you specify more memory per processor then the nodes have the job will not start.).
  • By default your job's working directory is your home directory. To change that, use batch system variable $PBS_O_WORKDIR which contains absolute path of the current directory in which qsub command was run. For other job environment variables please refer to the qsub manual page;
  • Before running the program it may be necessary to load the appropriate module for the MPI to have access to relevant parallel libraries (see our modules page);
  • Run your parallel program using mpiexec. Depending on how your program works you may want to reserve a full node but only run one process on it (i.e. on one processor). In that case use -pernode parameter to mpiexec. This operation is typical for a multi-threaded applications where a single process utilizes multiple execution threads which can make use of more then one processor on a node. In the above pingpong example we use -pernode to ensure that we only run one process per node.
  • All PBS specific keywords are specified on lines that begin with "#PBS".
  • All comments are specifed on lines that begin with one or more "#" (with due regard to the point above about PBS keywords).
  • Any line that is not a comment or not a PBS keyword line is interpreted as pertaining to the job being run on the batch machine. Thus in the above examples, once the batch job has started, it will first change directory to the specified location of the executable, load any needed modules, and then begin to execute the executable. 
  • If you need to pass your current shell variables to your qsub job, use qsub -V myscript on the command line (or put #PBS -V in the script)
  • To combine standard error and standard out to one file, add #PBS -j oe

The qsub command scans the lines of the script file for directives. An initial line in the script that begins with the characters "#!" or the character ":" will be ignored and scanning will start with the next line. Scanning will continue until the first executable line, that is a line that is not blank, not a directive line, nor a line whose first non-white space character is "#".If directives occur on subsequent lines, they will be ignored. The remainder of the directive line consists of the options to qsub in the same syntax as they appear on the command line. The option character is to be preceded with the "-" character. If an option is present in both a directive and on the command line, that option and its argument, if any, will be ignored in the directive. The command line takes precedence.

If an option is present in a directive and not on the command line, that option and its argument, if any, will be processed as if it had occurred on the command line.

If you would like to see more examples of PBS job submission files, go here.