system news

Problem with hpc2n mail support address. *SOLVED*

  • Posted on: 15 November 2017
  • By: roger

We are experiencing some problems with our support address <support@hpc2n.umu.se> at the moment. This seems to be SNIC wide and they are working on it. This news item will be updated when more information is available.

Update 2017-11-15 14:20

Problem should be solved now and it should be possible to send in support problems to our support address again.

Software stack on Kebnekaise rebuilt 20171018

  • Posted on: 18 October 2017
  • By: ake

Today, 2017-10-18 13:40, we switched over the software stack on Kebnekaise to one that has been rebuilt from scratch.

All user level codes and most libraries and helper programs have been rebuilt from scratch.

We have tried to make sure nothing used by our users is missing, but should any job fail due to missing libraries or similar, please make sure to notify support@hpc2n.umu.se immediately and we will fix the problem.

Maintenance window, batch jobs will not run (2017-09-04 08:00)

  • Posted on: 25 August 2017
  • By: ake

On Sep 4th 08:00 CEST we will have a maintenance window to change some parameters for the lustre file system.

To be able to do this we need to empty the clusters from running jobs and reboot all nodes including the login nodes.

As we get closer to that point in time, jobs will not be allowed to start if their requested runtime is too long to fit before 2017-09-04 08:00.

 

In other words, submitting jobs with shorter runtimes will be a good idea.

2017-06-29: Disruptive Lustre server issues, queues suspended (22:45 queues now re-enabled)

  • Posted on: 29 June 2017
  • By: nikke

Node reboots to activate security updates seems to have triggered strange disruptive behavior in the Lustre service affecting all compute resources.

We are currently diagnosing the issue in order to come up with a solution.

 

*UPDATE 2017-06-29 22:45*

Lustre now looks stable again. Queues have been re-enabled.

2017-06-28: Maintenance on the Lustre file system finished, queues running again

  • Posted on: 20 June 2017
  • By: ake

On Wednesday, June 28 08:00 - 17:00, we will perform yet another maintenance on our Lustre filesystem.

This time two of the internal UPSes need to be replaced.

To be on the safe side we will take the Lustre file system offline during this process.

This means that no jobs will be allowed to run after 08:00.

 

*Update 2017-06-28 15:00*

The maintenance is finished and the queues are running again.

2017-06-13: Resolved: PFS outage. Systems back to normal.

  • Posted on: 13 June 2017
  • By: zao

The PFS file system is having some server problems and is currently not accessible.
Due to this, it is not possible to log in to the login nodes.

The batch queues are suspended as we work on this.

We will update this news entry as we make progress and/or resolve the problem.

2017-06-13 17:45
The failing hardware have been reported to our hardware vendor.

2017-06-14 12:24
We are working with the vendor to resolve the problem.

Pages

Updated: 2017-12-06, 15:21