High Performance Computing Center North
The electrical maintenance that was scheduled for 19:00 on Wednesday evening did not go without problems.
Something went slightly wrong causing one of the UPS:es to fail. This in turn resulted in the cooling system failing, causing a quick rise of the temperature and a following emergency cut of power.
This caused so much problems that we will not be able to get things back on line until tomorrow (Thursday).
We lost, among other things, the /pfs/nobackup filesystem, which is the reason that the queues have been stopped. We expect that jobs have failed.
Sorry about this but it was outside of our control.
*UPDATE 2015-08-20 08:40*
We are currently waiting for the vendor to take a look at the state of the /pfs/nobackup hardware.
We hope that it will be recoverable, but it is in a state, hardware wise, we have never seen before.
Wed, 2015-08-19 21:48 | Åke Sandgren