system news

Batch queues stopped on Abisko and Akka

  • Posted on: 28 April 2016
  • By: admin

Due to an unexpected internal change in the last kernel update, both AFS (serving your normal home directory) and Lustre (serving /pfs/nobackup) stopped working.

We have therefor blocked all queues to make sure no more nodes reboot into the problematic kernel.

The problem does not affect running jobs.

We will remove this kernel and resume operation as soon as possible...

*UPDATE 2015-02-04 11:35 CET*
Both systems are now back in normal production.

Problem with /pfs/nobackup file system

  • Posted on: 28 April 2016
  • By: admin

The /pfs/nobackup file system is currently suffering from a misconfiguration.

The file system was created with fewer inodes than intended and we are in short supply at the moment.

This makes creation of files or directories to fail when we run out of inodes and thus may cause jobs to fail.

At the time of writing (2015-01-12 16:18 CET) we have ~4 million inodes available so the problem is not immediate but depending on what jobs are currently in the queue this could change quickly.

/pfs/nobackup file system maintenance finally done

  • Posted on: 28 April 2016
  • By: admin

The batch master server for Abisko needs hardware maintenance and the /pfs/nobackup file system needs a reconfiguration.

We will begin this on Wednesday Jan 21th, starting at 08:00 CET. Unfurtunately the /pfs/nobackup maintenace took longer than expected, so we're still working on this on Thursday Jan 22.

A reservation has been put inplace on the batch systems which means that jobs will not be started unless they are short enough to finish before that time.

The login nodes of both clusters will also be disabled.

Christmas holidays

  • Posted on: 28 April 2016
  • By: admin

During the Christmas holidays we are running with reduced staffing.

We will try to solve any arising problems as quickly as possible, but there will be delays due to this.

We will be back at full capacity on Jan 7th.

HPC2N staff whish You all a

Batch system OK

  • Posted on: 28 April 2016
  • By: admin

We had some raid problems on our batch server, meaning (among other things) that no jobs could be submitted. 

It was necessary to reboot to server. Everything seems to be working correctly again.

Fri, 2014-11-14 12:03 | Birgitte Brydsö

The new center storage system is now available

  • Posted on: 28 April 2016
  • By: admin

The new centre storage is now in production.

The /pfs/nobackup file system is now larger and faster, ... finally.

Almost all users have been synchronized to new new file system.
The few remaining users (those affected will get a separate mail) have been blocked from logging in and their jobs put on hold until the transfer is complete for each user.

Jobs are running again and login has been opened (see exception above).

If you notice anything strange please notify support@hpc2n.umu.se.

New center storage system

  • Posted on: 28 April 2016
  • By: admin
Dear Users,
 
We apologize for the late notification. However we have some really good news.
 
HPC2N has during the summer and early autumn procured, tested and deployed a new center storage system which will replace the old, aging GPFS based system. The new system is a DDN SFA 12KX and Exascaler solution using Lustre as the underlying filesystem. The new storage system consists of 1PB storage and will be up to 25 times faster, depending on I/O pattern, than the old one. 
 

Pages

Updated: 2025-04-29, 19:26