HPC2N - Support - The Batch system: Tips

Tips

 

  • In order to see your jobs only, run qstat -u <username>
  • Any program that is installed can be run interactively (qsub -I [requirements...]. Remember to use the "module load <program>" command to access it after the interactive job has started.
  • Programs which open a display can also be run interactively. Just use the -v DISPLAY option to qsub.
  • You can see which nodes you are using on one of the cluster machines with the command: cat $PBS_NODEFILE. This can either be used when you have an interactive job running or it can be added to the job script.
  • If there is a problem and stdout/stderr is not written to file, that can instead be done directly in the submit-file.
    Example (MPI - remember to load an MPI module first!)
    stdout/stderr in same file:
    mpirun ... >output 2>&1
    
    stdout/stderr in separate files:
    mpirun ... >output 2>error
    
  • When running OpenMP programs, you need the processors to be on the same node to get the advantage of shared memory. Remember, check how many processor a node has and don't ask for more than that.
  • If you run several jobs from one script, you do not want the output-files/error-files for the later ones to overwrite the earlier ones. This can be solved by using names of the following type for the output/error files:
    #PBS -o myjob.$PBS_JOBID.out
    #PBS -e myjob.$PBS_JOBID.err
    
    The file names will then contain the job-id, and so have a separate name for each job.
  • If your parallel program prints out something, the order of the print statements will be as default be random.
  • Remember that the number of processor you ask for on a node (ppn) can not be larger than the number of processors on each node on the machine in question.
  • If you do not define how much memory you want, you will be given the default amount, which is 1900mb on Akka (pmem=1900mb and pvmem=2000mb). In order to determine how much memory your program needs (if you do not know), you can look at https://www.hpc2n.umu.se/node/226 to see how much memory your job used (if you have already tried running it).
  • The maximum amount of memory you can ask for on Akka is pmem=15700mb and pvmem=16000mb. pvmem must always be greater than pmem.
  • The batch system will allocate a job to a combination of nodes where each node has at least pmem * requested-number-of-processors-per-node (default pmem is 1900mb on Akka)
  • Note: if you specify a combination (mem/procs) that requires more memory per processor then the nodes have, the job will not start.