GPFS problems (updates here when available)

  • Posted on: 27 April 2016
We are currently having  some problems with GPFS, the parallel filesystem, which we are investigating.

The queues on all clusters have been stopped.

Update 12:46
The problematic disk has been suspended and new data will not be written to it any longer.
Old, not broken, data can still be read.
Data is being migrated away from the problematic disk.
We currently do not have an estimate of how long this will take.

Update 16:22
This will take at least until Thursday evening.
We will keep the batch queue blocked from starting new jobs until the filesystem has been checked.
We will also block access to /pfs/nobackup from the login nodes of the clusters starting 17:00.

Update 2011-12-02 09:30
We are now running a check of the filesystem to make sure it is in a sane state before returning it to service. This will take several hours.

Update 2011-12-02 13:50
The filesystem check is now done and we are remounting the filesystem.
Everything is expected to be back around 14:15

