High Performance Computing Center North
There is a problem with the /pfs/nobackup file system as mounted on the login nodes and compute nodes, all accesses to it will hang.
If you currently touch that filesystem in your .bashrc or other login scripts, it's likely that your login attempt will appear to be hung.
*UPDATE 2015-02-28 11:15*
The problem has been reported to the vendor and we are waiting for them to get back to us
Batch queus has been stopped so no more jobs will be affected.
*UPDATE 2015-03-01 12:45*
The problem has been fixed, so jobs and PFS should once again be operational.
The login node of Akka is experiencing severe disk problems.
The disk will be replaced first thing on Monday (23/2) morning and the node reinstalled.
Until then it will be kept offlline.
This does not in any way affect running or queued jobs.
Any data can of course be picked up by logging in to Abisko.
*UPDATE 2015-02-23 07:00*
The login node of Akka is now back online
Sun, 2015-02-22 17:27 | Åke Sandgren
Due to an unexpected internal change in the last kernel update, both AFS (serving your normal home directory) and Lustre (serving /pfs/nobackup) stopped working.
We have therefor blocked all queues to make sure no more nodes reboot into the problematic kernel.
The problem does not affect running jobs.
We will remove this kernel and resume operation as soon as possible...
*UPDATE 2015-02-04 11:35 CET*
Both systems are now back in normal production.
The /pfs/nobackup file system is currently suffering from a misconfiguration.
The file system was created with fewer inodes than intended and we are in short supply at the moment.
This makes creation of files or directories to fail when we run out of inodes and thus may cause jobs to fail.
At the time of writing (2015-01-12 16:18 CET) we have ~4 million inodes available so the problem is not immediate but depending on what jobs are currently in the queue this could change quickly.
The batch master server for Abisko needs hardware maintenance and the /pfs/nobackup file system needs a reconfiguration.
We will begin this on Wednesday Jan 21th, starting at 08:00 CET. Unfurtunately the /pfs/nobackup maintenace took longer than expected, so we're still working on this on Thursday Jan 22.
A reservation has been put inplace on the batch systems which means that jobs will not be started unless they are short enough to finish before that time.
The login nodes of both clusters will also be disabled.
During the Christmas holidays we are running with reduced staffing.
We will try to solve any arising problems as quickly as possible, but there will be delays due to this.
We will be back at full capacity on Jan 7th.
HPC2N staff whish You all a
We had some raid problems on our batch server, meaning (among other things) that no jobs could be submitted.
It was necessary to reboot to server. Everything seems to be working correctly again.
Fri, 2014-11-14 12:03 | Birgitte Brydsö