FAQ
Software Compiling Batch system |
This FAQ should help HPC2N users answer common questions regarding:
User accounts and projectsQ: I have forgotten my user password. What should I do? A: When you applied for a new user account you were asked to keep a copy of your original application in a secure place. We can reset your password to the temporary one mentioned in the application form. If you don't have the form anymore, we can also send you a new copy by mail provided your address has not changed. In other cases you will have to apply for a new account using a different user account name. Let us know if you need to save some data from your old account. Please contact us via e-mail: support@hpc2n.umu.se. Q: Where can I see how much CPU time my project used? A: You can use command projinfo -p <project_ID> -v. For more information see our projectinfo webpage. HPC2N systemsQ: What is the physical CPU and memory layout of HPC2N nodes? A: Each cluster node is an SMP (Symmetric multiprocessing) system consisting of 8 (Akka) cores. In SMP identical cores have a uniform access to the local shared memory (16 GB on Akka). See image below (L1: Level 1 Cache, L2: Level 2 Cache, C: processor core, RAM: Random Access Memory, H: Hyper Transport).
Batch system and batch jobsQ: What is the maximum time a job can run? A: A job can run for up to the number of allocated CPU hours per month divided by five. However, the maximum number of CPU hours any process can run is 120 (or 5 days). For more information see our batch system webpage. Q: My job does not run/was placed in the "Blocked" part of job queue, why? A: The reason depends on the situation. Running the command checkjob <JobID> can provide you with more information. At the end of the output you can normally see the reason why a certain job was not allow to run at that time. Typicall reasons include:
For example, if you requested a walltime which exceeded SOFT MAXPS limit (see The Batch system at HPC2N) checkjob will report a similar message: job cannot run in partition DEFAULT. (job 163022 violates active SOFT MAXPS limit of 28800000 for acct SNICXXX-YY-ZZZ (R: 21600000, U: 8327520)) As can be seen above the requested time plus time being used by your other jobs (R: 21600000 + U: 8327520) exceeds SOFT MAXPS limit of 28800000 CPUseconds. In such a case you either have to decrease walltime of your current job or wait until one or more of your running jobs finish (after which the current job will be moved automatically from "Blocked" to "Idle" part of the queue). Q: Can I log in to computation nodes to see how my jobs are running? A: We usually don't allow users to log in to computation nodes. One way to check the job status on other nodes is to use job activity graphs on our webpage Graphs of cluster nodes during jobs. Q: What combination of nodes and ppn should I use for a multi-threaded application? A: At HPC2N we only allow processes of one user to run on a particular node. That way we prevent a situation in which a user with multi-threaded application (which runs as one process, and is thus treated by the batch system, but actually uses multiple processors) competes with other users' ordinary processes. Supposing you want to run m multi-threaded processes on n processors you need to make sure that each process is allocated to exactly one node:
For more complex configurations please contact HPC2N support: support@hpc2n.umu.se. Q: My jobs take very long to become scheduled for running on a cluster, what could be the problem? A: If you did not specify a valid project in your submit file (using #PBS -A directive) your job will be assigned a low priority in the job queue and run in a project account DEFAULT, which is shared among all users that don't have a SNIC project allocation (large, medium or small). This account is only given a small fraction of system resources and its main purpose is for small-scale testing. To apply for a project please see rules described on SNIC homepage. A small level request should be sent directly to HPC2N by the Principal Investigator (PI). You can find more information here. Q: I tried to submit a batchjob and got an email with an error message similar to: Unable to copy file /var/spool/torque/spool/<PBS job id>.OU to /home/u/username/jobfile.out >>> error from copy /bin/cp: cannot create regular file `/home/u/username/jobfile.out': Permission denied >>> end error output A: You have specified the output file location to be on the AFS file system. However, the batch system does not have an AFS access token to be able to write there. Instead, you will have to use your personal directory on our GPFS parallel file system in /pfs/nobackup/u/username. Please see the page about File systems for a description of various file systems at HPC2N and how to use them. File systemsQ: I cannot access files in my home directory (file attributes are set to '?'). A: Run afslog command to obtain a new AFS token. If that does not help it is likely that your Kerberos authentication ticket has expired (run klist to check the status). To obtain a new Kerberos ticket issue the kinit command. Q: Which files needs to be set world-readable, and why? A: AFS (and thus your home directory and subdirectories of it) is backed up nightly. The newest backed up version can be found in the directory OldFiles/, found in your home directory. You can just copy the file you deleted from OldFiles/. If it has been more than 24 hours since you deleted the file, you need to contact support. Compiling and compilersQ: I need to use a specific compiler version with MPI. Which modules should I add? A: First add the compiler, then MPI module. For example: module add intel-compiler/10.1 openmpi/intel. If you would not need a specific compiler version it would be enough to write: module add openmpi/intel which would add the default version of the compiler. Q: Why does my compilation fail with: "*** Subscription: Unable to find a server."? A: The above message occurs when all of our PathScale compiler licences are in use. You have to try again after a while (ca 5-10 minutes). Parallel SoftwareQ: What is the difference between mpich and mvapich? A: mvapich is an implementation of mpich to make efficient usage of Infiniband network. Q: Can I disable usage of Infiniband by OpenMPI? A: Use parameter -mca btl '^openib' with mpiexec. Keep in mind that the option is for testing purposes only as your communication would otherwise interfere with other gigabit Ethernet traffic (especially the /pfs/nobackup file system traffic). Q: How do I increase the stack size of an OpenMP thread when running a PathScale(TM) Fortran program? A: Add export PSC_OMP_STACK_SIZE=128m into your submit file to set the per thread value to 128MB. Q: How can I get access to a licenced software (e.g. VASP)? A: We need to get a confirmation from a licence holder that you can use the software along with a licence number and/or complete licence name. |



