PBS Job Status
|
qstat Using the command qstat -a will show you the jobs currently running and their ID's. Example (run on Akka):
p-bc9901 [~/pfs]$ qstat -a
p-mn01.hpc2n.umu.se:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------------------------------------------------------------
353476.p-mn01.hpc2n. user1 batch H2 -- 1 -- -- 72:00 Q --
402126.p-mn01.hpc2n. user4 batch job5 -- 2 -- -- 120:0 H --
402127.p-mn01.hpc2n. user4 batch job6 -- 2 -- -- 120:0 H --
402128.p-mn01.hpc2n. user4 batch job7 -- 2 -- -- 120:0 H --
402129.p-mn01.hpc2n. user4 batch job8 -- 2 -- -- 120:0 H --
472294.p-mn01.hpc2n. user7 default temp4 -- 1 -- -- 05:10 Q --
472295.p-mn01.hpc2n. user7 default script4 -- 1 -- -- 05:10 Q --
472296.p-mn01.hpc2n. user7 default script3 -- 1 -- -- 05:10 Q --
472315.p-mn01.hpc2n. user7 default script2 -- 1 -- -- 05:10 Q --
472316.p-mn01.hpc2n. user7 default script1 -- 1 -- -- 05:10 Q --
472317.p-mn01.hpc2n. user7 default new_job -- 1 -- -- 05:10 Q --
472318.p-mn01.hpc2n. user7 default parallel -- 1 -- -- 05:10 Q --
493066.p-mn01.hpc2n. user7 batch tmp.sh 12922 12 -- -- 32:00 R 29:47
493073.p-mn01.hpc2n. user4 batch job_akka 8688 1 -- -- 92:00 R 27:31
493074.p-mn01.hpc2n. user4 batch my_job 8743 1 -- -- 92:00 R 27:33
493075.p-mn01.hpc2n. user4 batch my_serial 8786 1 -- -- 92:00 R 27:33
493076.p-mn01.hpc2n. user4 batch my_job2 8881 1 -- -- 92:00 R 27:32
493077.p-mn01.hpc2n. user4 batch my_job3 8923 1 -- -- 92:00 R 27:32
493078.p-mn01.hpc2n. user1 batch my_job4 8992 1 -- -- 92:00 R 27:31
472319.p-mn01.hpc2n. user1 default job_akka2 -- 1 -- -- 05:10 Q --
472320.p-mn01.hpc2n. user1 default job_akka3 -- 1 -- -- 05:10 Q --
472321.p-mn01.hpc2n. user1 default job_akka4 -- 1 -- -- 05:10 Q --
472322.p-mn01.hpc2n. user1 default openmp_job -- 1 -- -- 05:10 Q --
Where 'Q' = Queued, 'R' = Running, and 'H' = Held. The list can be very long, making it difficult to find your own runs. If that is the case, use the following command to ask for jobs submitted by a specific user:
p-bc9901 [~/pfs]$ qstat -a -u user1
p-mn01.hpc2n.umu.se:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------------------------------------------------------------
353476.p-mn01.hpc2n. user1 batch H2 -- 1 -- -- 72:00 Q --
493078.p-mn01.hpc2n. user1 batch my_job4 8992 1 -- -- 92:00 R 27:31
472319.p-mn01.hpc2n. user1 default job_akka2 -- 1 -- -- 05:10 Q --
472320.p-mn01.hpc2n. user1 default job_akka3 -- 1 -- -- 05:10 Q --
472321.p-mn01.hpc2n. user1 default job_akka4 -- 1 -- -- 05:10 Q --
472322.p-mn01.hpc2n. user1 default openmp_job -- 1 -- -- 05:10 Q --
checkjob To get more information about a specific job, use the command checkjob <job_id>. You get the <job_id> either when the job is submitted, or from running the above commands. This may sometimes help you see why the batch system is not starting your job. Example p-bc9901 [~/pfs]$ checkjob 493716 checking job 493716 State: Idle Creds: user:user123 group:folk account:DEFAULT class:batch qos:DEFAULT WallTime: 00:00:00 of 00:30:00 SubmitTime: Mon Nov 9 16:54:40 (Time Queued Total: 00:00:06 Eligible: 00:00:06) Total Tasks: 1 Req[0] TaskCount: 1 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Dedicated Resources Per Task: PROCS: 1 MEM: 1900M SWAP: 2000M IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE PREEMPTOR PE: 1.00 StartPriority: -19333231 job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 1 procs found) idle procs: 56 feasible procs: 0 Rejection Reasons: [CPU : 662][ReserveTime : 10] p-bc9901 [~/pfs]$ showq Another useful command is showq, which shows the job queue from the perspective of Maui (the job scheduler). In many instances it gives more useful output, as it can immediately be seen how many jobs are running, idle, blocked, etc. It can be given the flag -u <username> to limit the output to the jobs belonging to that user.
p-bc9901 [~/pfs]$ showq -u user123
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
0 Active Jobs 5200 of 5288 Processors Active (98.34%)
661 of 661 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
500207 user123 Idle 4 00:04:00 Fri Nov 13 14:26:54
500208 user123 Idle 4 00:04:00 Fri Nov 13 14:26:54
500209 user123 Idle 4 00:04:00 Fri Nov 13 14:26:55
500211 user123 Idle 4 00:04:00 Fri Nov 13 14:27:22
500212 user123 Idle 4 00:04:00 Fri Nov 13 14:27:23
500213 user123 Idle 4 00:04:00 Fri Nov 13 14:27:23
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
500203 user123 Idle 1 00:30:00 Fri Nov 13 14:26:26
500204 user123 Idle 1 00:30:00 Fri Nov 13 14:26:27
500205 user123 Idle 1 00:30:00 Fri Nov 13 14:26:38
500206 user123 Idle 1 00:30:00 Fri Nov 13 14:26:38
500210 user123 Idle 1 00:30:00 Fri Nov 13 14:27:16
500214 user123 Idle 4 00:04:00 Fri Nov 13 14:27:49
Total Jobs: 12 Active Jobs: 0 Idle Jobs: 6 Blocked Jobs: 6
p-bc9901 [~/pfs]$
showstart This command can be used to get a (very) rough estimate of when the job will start. Note that jobs may starter sooner or later, depending on the priority of other (newer) jobs and the speed with which the currently running jobs finish. You run it with showstart <JOB-id> |



