system news

PFS down (resolved)

Posted on: 28 April 2016
By: admin

As is tradition lately, PFS is again inaccessible and attempts to use it will hang.
Batch queues have been suspended and we're poking the storage system.

Update 2015-08-08 21:39 CEST:
The filesystem is back online and the queues have been resumed.

Fri, 2015-08-07 15:05 | Lars Viklund

PFS inaccessible

Posted on: 28 April 2016
By: admin

We are again having some problems with the pfs system. File access is slow, or hanging. We are investigating and more information will follow as soon as we have it.

The queues have been paused and will be resumed when access to pfs has been restored.

Update 12:26: The PFS system is back online and all queues have been resumed. Sorry about any inconvenience the problem caused. If you notice further problems with the pfs, please report it to support@hpc2n.umu.se.

Problem with pfs

Posted on: 28 April 2016
By: admin

We are currently experiencing some problems with the pfs system. File access is slow, or hanging. We are currently looking for the cause. More information will follow as soon as we have it.

The queues have been paused and will be resumed when access to pfs has been restored.

Update 20:44: The PFS system is back online and all queues have been resumed. We apologize for any inconvenience. If you notice any further problems wiht the PFS file system please report it to support@hpc2n.umu.se.

Driver issues with Abisko interconnect.

Posted on: 28 April 2016
By: admin

We are experiencing some driver issues with the infiband interconnect. This might lead to jobs failing with things like "Invalid resource type" and "Invalid CQ event".

We are in the process of updating the drivers, and we hope that this will solve the issues. Since we are unwilling to abort running jobs, the exact time when this update will be fully finished is not known. But we do expect it to be done before the end of the week.

Mon, 2015-07-06 09:38 | Roger Oskarsson

/pfs/nobackup file system have failed

Posted on: 28 April 2016
By: admin

The /pfs/nobackup file system have failed in a more severe way this morning.

All queues have been stopped, and any accesses to the file system hangs.
(Thus anyone whose .bashrc does things in the file system will probably not be able to login at all)

We are chasing the vendor for a solution to the multiple problems...

Tue, 2015-06-02 08:08 | Åke Sandgren

The /pfs/nobackup problems from 2015-05-27 are currently solved

Posted on: 28 April 2016
By: admin

The problems we had with the /pfs/nobackup filesystem is currently solved.

We are still waiting for the vendor to tell us exactly what happened and how to make sure it doesn't happen again.

But for the time being things are expected to be back to normal.

Some jobs may have failed due to the timeouts that resulted from the problem but as far as we can tell at the moment no files have been lost.

We apologize for this and will try to minimize the risk of something like this happening again.

/pfs/nobackup file system having problems again

Posted on: 28 April 2016
By: admin

The /pfs/nobackup filesystem is once again experiencing problems.

We are currently investigating with the vendor trying to find a solution.

The batch queues has been stopped due to the severity of the problem.

And the login nodes are currently stopped.

Wed, 2015-05-27 16:07 | Åke Sandgren

/pfs/nobackup back online

Posted on: 28 April 2016
By: admin

The /pfs/nobackup file system is now back online.

Some files and directories may have been lost.
We have sent a mail to users we know are affected.

Please let us know if you find missing files or directories.

There is however no way to retrieve any lost data.

We are sorry about this and we are working with the vendor to reduce the possibility of it happening again.

pfs read-only

Posted on: 28 April 2016
By: admin

There is currently (2015-03-23 15:11) an outage of the PFS filesystem, rendering it read-only on all HPC2N resources.

Batch queues have been suspended until the problem is resolved.

Update 2015-03-23 16:13:
The problem seems complex and it is unlikely that it will be resolved today.

Update 2015-03-24 18:13:
We believe the problem has been resolved. PFS is again accessible and batch queues have been resumed.

*Update 2015-03-24 19:30*
The file system went back to being mounted read-only. The problem is still there...

pfs problems

Posted on: 28 April 2016
By: admin

We where experiencing problems with pfs on Akka and Abisko during Friday 13th.

*UPDATE 2015-03-13 20:10 /pfs/nobackup is now back in production*

Fri, 2015-03-13 17:30 | Birgitte Brydsö