system news

Problems with project storage (2021-06-11) *SOLVED 2021-06-12 01:17*

  • Posted on: 11 June 2021
  • By: brorerik

Problems with project storage, mkdir fails and new directories cannot be created

The exact cause is not known but we are doing active debugging together with our vendor. During the debugging we have blocked the starting of new batch jobs. You can still submit jobs but they will not start until the problem is solved.

 

* SOLVED 2021-06-12 01:17 *

The problem was finially identified and a workaround put in place.

The system is now back in production.

Minor upgrade to $HOME and project storage file servers 2021-05-26 - FINISHED

  • Posted on: 24 May 2021
  • By: ake

During the period 2021-05-26 - 27 we will be doing a minor upgrade to the file servers for $HOME and the project storage.

The upgrade will be done in two stages to avoid requiring a full downtime.

There will be an initial shorter service interrupt Wednesday around lunch, followed by reduced performance.

There will then be another slightly longer service interrupt followed by another period of reduced performance.

When the second interrupt occurs depends on how fast the first stage of the upgrade goes.

 

Upgrade to Ubuntu Focal on clusters starting 2021-04-19 *UPDATED 2021-04-27, DONE*

  • Posted on: 12 April 2021
  • By: ake

Since the current operating system version, Ubuntu Xenial 16.04, that we are running on our clusters is reaching its End-Of-Life on 2021-04-30, we are upgrading to Ubuntu Focal 20.04.

The upgrade process will start 2021-04-19 and will be done with minimal impact to users running jobs.

Access to user data will not be affected by this upgrade.

There is already now a test environment available for users to check this out.

It is imperative that users test this out as soon as possible and notify us of any missing softwares.

Problems with home-directories and project storage (2021-04-01) *SOLVED 2021-04-06*

  • Posted on: 1 April 2021
  • By: roger

We are noticing intermittent file server crashes causing problems. This causes problems with the file systems for $HOME and project storage. As a user it is mostly noticed by logins getting stuck after authentication and/or really slow filesystem access (simple ls might takes minutes).

The exact cause is not known but we are doing active debugging together with our vendor. During the debugging we might block the starting of new batch jobs. You can still submit jobs but they will not start.

 

Cluster maintenance at HPC2N 2021-03-22 - 2021-03-25, *FINISHED*

  • Posted on: 5 March 2021
  • By: ake

Dear users,

During this maintenance, 2021-03-22 - 2021-03-25, we’re going to do some upgrades on the parallel file system, where home directories and project storage is located, along with other upgrades on Kebnekaise itself.

Since this maintenance affects the parallel file system we have to drain the batch nodes from running jobs. Login sessions will be disabled and active sessions will be terminated, during that period.

Migrating away from /pfs/nobackup at HPC2N

  • Posted on: 28 January 2021
  • By: bbrydsoe

Migrating away from /pfs/nobackup at HPC2N

Dear PIs and users at HPC2N.

As you hopefully already know our new storage system is in full production since November.

There is still some work to be done by You, the user, to make the transition to Project Storage complete.
All data in your /pfs/nobackup$HOME space must be moved to a Project Storage directory or to your $HOME space depending on the type and amount of data.

2020-01-13 CVMFS issues affects local software and modules *Resolved*

  • Posted on: 13 January 2021
  • By: nikke

We are currently having issues with the CVMFS subsystem on all HPC2N machines. This affects local software and modules, amongst other things.

Fixing is in progress, but might take a while before everything is sorted out.

We apologize for the inconvenience.

 

2021-01-13 11:59

The problem has been resolved

 

 

 

Bus error on Kebnekaise

  • Posted on: 26 December 2020
  • By: zao

As a side effect of the recent file system upgrade we are observing a small set of user-installed programs crashing with a bus error when loading dynamic libraries from the PFS file system. We are working with the vendor to find the root cause of these and are running some bulk operations on the file system to mitigate the problem.

The problem is elusive and may only affect a particular set of nodes and may disappear when hashing or otherwise fully reading the affected files on that node, or reinstalling the affected files in another location.

Pages

Updated: 2024-06-25, 16:43