HPC2N - Support - File system: File system and storage

File systems and storage

 

AFS - Andrew File System

Your home-directory (ie. the directory pointed to by the $HOME variable) is placed on an AFS file system. This file system is backed up regularly.

Note that since ticket-forwarding to batch jobs does not work, so the only AFS-access possible from batch jobs are to read files from your Public-directory which is world-wide readable (yes, the entire world). Use the GPFS 'parallel' file system for data management in conjunction with batch jobs.

To find the path to your home directory, either run pwd just after logging in, or

p-bc9901 [~/C]$ cd
p-bc9901 [~]$ pwd
/home/u/username
p-bc9901 [~]$ 

If you need more space in your home directory, contact support@hpc2n.umu.se and include an explanation of what you need the extra space for.

See AFS at HPC2N for further explanation of AFS.

GPFS ('parallel') File System

There is a GPFS file system available on all clusters.

Apart from your usual home directory you also have file space in the parallel file system. This file system is set up in "parallel" to the usual home tree, but starting from /pfs/nobackup instead. Thus, to create a soft link from your home directory to your corresponding home on the parallel file system, you could issue the following command:

$ ln -s /pfs/nobackup$HOME $HOME/pfs

Now, if you do

$ cd ~/pfs

you will end up in your "parallel" home directory.

Your home directory on the parallel file system is very useful, since batch jobs can create files there without any Kerberos ticket or manipulations with permissions. Moreover the parallel file system offers high performance when accessed from the nodes making it suitable for storage that are to be accessed from parallel jobs.

Note that the parallel file system is not intended for permanent storage and there is NO BACKUP of /pfs/nobackup. In case the file system gets full, files that have been unused for some time might get deleted without warning.

In order to avoid having runaway programs filling the file system we have enabled quotas with a 1500GB soft limit and a 2000GB hard limit.
If this limit is too small you should contact support@hpc2n.umu.se, and include an explanation of what you need the extra space for.

HSM - Hierarchial Storage Management (tape backed disk frontend) 

This is the Hierarchial Storage Management (HSM) file system. HSM means that the file system move files currently not used to tape. It is intended for archiving LARGE files which you do not need high speed access to. I.e. large results that you want to keep on a file system safer than scratch file systems (which doesn't get backed up, see above) but are too big for your usual home directory.

Store large files on HSM, at least 100 MByte, but preferably 1 GByte or more. The is because the general recall time of a file is

120+size_in_mb/30

seconds, and you can easily see that it's MUCH more effective to save large archives instead of many small files on HSM.

If you have several small files you need to move out of the way, use GNU tar to create an archive of them and put that archive on HSM. If the file size of the resulting archive is less than 100 MB, then you should archive your results in larger chunks.

A quick introduction to tar:

Create compressed archive: tar -cvzf archive.tar.gz a-file a-directory more-files more-directories
View contents (file list) of compressed archive: tar -tzf archive.tar.gz
Extract compressed archive into current directory: tar -xvzf archive.tar.gz

More information is available in the manpage, run man tar.

If you have any questions regarding suitable ways to archive your results on the HSM file system, please contact support@hpc2n.umu.se.

The upper limit on file size is around 500Gbyte due to the limited size of the HSM frontend file system. Use the df -k ~/hsm/ command to get information about the current available space in the frontend HSM file system.

If you intend to store more than 5TB (5000GB) on HSM, please contact support@hpc2n.umu.se in advance.

We strongly discourage storing more than approximately 10000 files on HSM, if you need to store that many files please investigate ways to archive your results in larger chunks.

Files on HSM space gets automatically migrated to tape storage when the front end disksystem gets too full. The HSM file system is also backed up.

The HSM storage space is available as ~/hsm. If, for some reason, there is no such link you can create it with;

$ ln -s /hsm$HOME $HOME/hsm

Please be aware that accessing a file from HSM storage might take a VERY long time since it might be migrated to tape and all tapedrives could be busy. Also note that the retrieval time per file is rather constant regardless of the file size, so be sure to use tar or a similar program to pack multiple small files into larger archives.

In order to help assess HSM usage we have written a small tool called hsmusagethat can be handy. Simply run hsmusage directorynameto get a summary of usage of different filesizes.

/scratch

On some of the computers at HPC2N there is a directory called /scratch. It is a local disc area, usually pretty fast and big. It is intended for saving (temporary) files you create or need during your computations. Please do not save files in /scratch you don't need when not running jobs on the machine, and please make sure your job removes any temporary files it creates.

When anybody need more space than available on /scratch, we will remove the oldest/largest files without any notices.

There is NO backup of /scratch.