system news

/pfs/nobackup currently not responding again, ALL batch queues stopped (FIXED AGAIN)

  • Posted on: 28 April 2016
  • By: admin

The /pfs/nobackup filesystem stopped responding again.

We're working on getting it back online.

*UPDATE 2015-10-16 10:05*
The file system is now back online again and the queues are running.

*UPDATE 2015-10-16 12:35*
The file system is non-responsive again, we're trying to get it back online as soon as possible

*UPDATE 2015-10-16 20:00*
The file system is now back online and the problem has been identified.

Fri, 2015-10-16 08:28 | Åke Sandgren

Power maintenance 2015-09-18 06:30, batch nodes will be drained of running jobs.

  • Posted on: 28 April 2016
  • By: admin

The power company is doing a maintenance on the high voltage feed to the University on Friday September 18th, 06:30.

This will result in total power loss to the whole University, thus we need to drain the batch nodes.

During the days leading up to the maintenance window, it is advisable to submit shorter jobs, that can finish in the remaining time until the window starts. To allow a little bit of margin the system will not allow jobs to run after 06:20 on the 18th.

Batch queues stopped due to /pfs/nobackup being out of inodes (files). (Partially fixed)

  • Posted on: 28 April 2016
  • By: admin

We have unfortunately been forced to stop all batch queues on the clusters.

/pfs/nobackup has run out of inodes. Something probably created more inodes (files) then intended.

We are working on finding out where and getting the usage down to normal levels.

Until this is fixed we need to keep the batch queues stopped to avoid risking jobs to fail due to not being able to create new files.

pfs problems (solved)

  • Posted on: 28 April 2016
  • By: admin

The /pfs/nobackup (lustre) file system is currently unavailable due to after effects of an electric maintenance problem (see here).

We apologize for any inconvenience this may cause.

We are currently working on restoring access to pfs, but we do not have an ETA right now.

This news will be updated with more information when we have it.

*UPDATE 20150820 15:45*
It will, most likely, take at least until Friday 21th, before we can get this resolved.

/pfs/nobackup filesystem now back in production

  • Posted on: 28 April 2016
  • By: admin

The filesystem is now back in production.

As far as we can tell no files or directories where lost, but if you do find evidence of that, please notify support@hpc2n.umu.se, so we can report it to the vendor.

We're sorry about the long downtime, but we were verifying each step with the vendor, to do our best not to loose any data.

Mon, 2015-08-24 11:32 | Åke Sandgren

Electrical maintenance causing problems

  • Posted on: 28 April 2016
  • By: admin

The electrical maintenance that was scheduled for 19:00 on Wednesday evening did not go without problems.

Something went slightly wrong causing one of the UPS:es to fail. This in turn resulted in the cooling system failing, causing a quick rise of the temperature and a following emergency cut of power.

This caused so much problems that we will not be able to get things back on line until tomorrow (Thursday).

We lost, among other things, the /pfs/nobackup filesystem, which is the reason that the queues have been stopped. We expect that jobs have failed.

Scheduled electrical maintenance, Wed Aug 19 09:00-14:00

  • Posted on: 28 April 2016
  • By: admin

On Wednesday August 19 09:00-14:00 all compute clusters will be unavailable due to scheduled maintenance of the high-voltage electrical infrastructure.

No compute jobs extending into the downtime window will start, consider scheduling shorter jobs if possible to enable maximum utilization of the systems before the downtime.

Tue, 2015-08-11 09:33 | Niklas Edmundsson

PFS down (resolved)

  • Posted on: 28 April 2016
  • By: admin

As is tradition lately, PFS is again inaccessible and attempts to use it will hang.
Batch queues have been suspended and we're poking the storage system.

Update 2015-08-08 21:39 CEST:
The filesystem is back online and the queues have been resumed.

Fri, 2015-08-07 15:05 | Lars Viklund

PFS inaccessible

  • Posted on: 28 April 2016
  • By: admin

We are again having some problems with the pfs system. File access is slow, or hanging. We are investigating and more information will follow as soon as we have it.

The queues have been paused and will be resumed when access to pfs has been restored.

Update 12:26: The PFS system is back online and all queues have been resumed. Sorry about any inconvenience the problem caused. If you notice further problems with the pfs, please report it to support@hpc2n.umu.se.

Pages

Updated: 2025-04-29, 19:26