system news

GPU outage on Kebnekaise (resolved)

Posted on: 25 September 2018
By: zao

The GPU nodes of Kebnekaise are temporarily unavailable since the evening of Monday 2018-09-24. Jobs that started after that may have failed with messages about mismatched driver versions and may report missing GPUs.

We are addressing the problem, while work progresses nodes will be unavailable and the queue may report that jobs are blocked due to resources. We will update this news entry when this is resolved.

Kebnekaise running jobs failed 2018-09-05 18:44

Posted on: 6 September 2018
By: brorerik

Kebnekaise running jobs failed 2018-09-05 18:44

Kebnekaise cluster nodes had unexpected problems during the
maintenance of a internal service related to them.

This caused disk access timeouts, which in turn terminated
jobs running on the nodes.

The problem have been solved and the cluster is working
normally again.

We are sorry about the problems this have caused.

Downtime, all systems: 2018-08-13, 18:00 - 2018-08-14, 10:00

Posted on: 7 August 2018
By: bbrydsoe

We will have a downtime of all our systems Monday, 2018-08-13, 18:00 - Tuesday, 2018-08-14, 10:00.

The reason for this is that there will be a power outage on the entire Umeå Campus, Monday, 2018-08-13, 20:00-24:00. We need some time after this to get all our systems at HPC2N up and running again.

Skylake and Volta nodes down. UP NOW

Posted on: 3 July 2018
By: roger

Because of an automatic kernel update on the Skylake and Volta GPU nodes, they failed to reboot afterwards. They need a manually built kernel and we are working on the problem.

Update 2018-07-03 14:00: New kernels built and the Skylake and Volta nodes are up again.

Intel Skylake and NVIDIA Volta gpu nodes now available to users

Posted on: 21 June 2018
By: ake

The recently announced expansion to Kebnekaise is now available to users.

Please read the Kebnekaise usage guide for information on how to use the various parts.

Maintenance work are complete

Posted on: 9 June 2018
By: torkel

The upgrade of Lustre parallel file system (/pfs/nobackup) as well as the rest of the maintenance we have done during the week are now complete. All systems are up and running again, including the batch queues on both Kebnekaise and Abisko. The login nodes are open for acess again.

During the week we have, among other things:

Maintenance affecting all HPC2N systems between 2018-06-04 and 2018-06-08

Posted on: 22 May 2018
By: ake

We have a major maintenance to upgrade the Lustre parallel file system (/pfs/nobackup).

One of the goals is to introduce new functionality, for instance project based storage.

We will also do a number of related changes and optimizations to the Lustre file system setup.

We are planning for a full week of complete downtime on all HPC2N systems, including login and thinlinc nodes, starting 2018-06-04 07:30 CEST.

Please make sure to copy any files you want to work with during the downtime to some other system well in advance of the maintenance.

Maintenance for Slurm (batch system) upgrade on Abisko and Kebnekaise: 2018-03-16

Posted on: 7 March 2018
By: ake

The maintenance window has been moved to March 16th.

On March 16 we will have a maintenance window for upgrading Slurm (the batch system) on both Abisko and Kebnekaise.

We have therefor put a reservation in place on both clusters starting 2018-03-16 08:00.

Jobs will not be allowed to start if their requested runtime reaches beyond that point in time.

So leading up to this maintenance window it is advantageous to submit jobs with smaller runtimes.

We expect to be finished no later than 2018-03-17 17:00.

Allinea and Matlab unavailability 15 February 2018 18:00-18:30

Posted on: 14 February 2018
By: bbrydsoe

Due to maintenance, the license servers for Allinea and Matlab will be unavailable for approximately 1/2 hour between 18:00-18:30, 15 February 2018.

TensorFlow 1.5.0 now installed on Kebnekaise for both CPU and GPU

Posted on: 14 February 2018
By: ake

TensorFlow 1.5.0 has now been installed on Kebnekaise.

There is both a CPU and a GPU version.

Note that the module has changed name and is now TensorFlow with capital T and F.

GPU outage on Kebnekaise (resolved)

Kebnekaise running jobs failed 2018-09-05 18:44

Downtime, all systems: 2018-08-13, 18:00 - 2018-08-14, 10:00

Skylake and Volta nodes down. UP NOW

Intel Skylake and NVIDIA Volta gpu nodes now available to users

Maintenance work are complete

Maintenance affecting all HPC2N systems between 2018-06-04 and 2018-06-08

Maintenance for Slurm (batch system) upgrade on Abisko and Kebnekaise: 2018-03-16

Allinea and Matlab unavailability 15 February 2018 18:00-18:30

TensorFlow 1.5.0 now installed on Kebnekaise for both CPU and GPU

Pages

HPC2N footer

Search form

You are here

system news

Pages

HPC2N footer