HDF5
Policy
HDF5 is freely available to users at HPC2N.
General
HDF5 is a data model, library, and file format for storing and managing data.
Description
HDF5 supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
The HDF5 technology suite includes:
- A versatile data model that can represent very complex data objects and a wide variety of metadata.
- A completely portable file format with no limit on the number or size of data objects in the collection.
- A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
- A rich set of integrated performance features that allow for access time and storage space optimizations.
- Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.
Availability
On HPC2N we have HDF5 available as a module on Akka and Abisko.
Usage at HPC2N
To use the HDF5 module, add it to your environment:
module add hdf5/<compiler>
Where <compiler> is one of intel | pgi | psc | gcc. Note that you must give the compiler. The module will then also load the compiler. There are a number of different versions of hdf5, which can be seen with the command
module avail hdf5
Loading the module should set any needed environmental variables as well as the path.
There are example programs in the examples directory of the HDF5 installation. They can be found here: /afs/hpc2n.umu.se/lap/hdf5/1.8.7/src/hdf5-1.8.7/dist/examples (or similar for other versions).
HDF5 is used by adding calls to your program, depending on whether you want to create a new data file, read from an existing data file, or write to an existing data file.
Remember, you must add the HDF header file
#include "hdf5.h" | C #include "H5Cpp.h" | C++ USE HDF5 | Fortran
to the header of your program.
File Access Modes
- H5Fcreate accepts H5F_ACC_EXCL or H5F_ACC_TRUNC
- H5Fopen accepts H5F_ACC_RDONLY or H5F_ACC_RDWR
- H5F_ACC_EXCL If the file already exists, H5Fcreatefails. If the file does not exist, it is created and opened with read-write access. (Default)
- H5F_ACC_TRUNC If the file already exists, the file is opened with read-write access, and new data will overwrite any existing data. If the file does not exist, it is created and opened with read-write access.
- H5F_ACC_RDONLY An existing file is opened with read-only access. If the file does not exist, H5Fopen fails. (Default)
- H5F_ACC_RDWR An existing file is opened with read-write access. If the file does not exist, H5Fopen fails.
Creating a file
- Define the file creation property list
- Define the file access property list
- Create the file
- More information here about creating a file, opening an exisiting file, and closing a file.
Examples from HDF homepage
Creating an HDF5 file using property list defaults
file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL,
H5P_DEFAULT, H5P_DEFAULT)Creating an HDF5 file using property lists
fcplist_id = H5Pcreate (H5P_FILE_CREATE)
<...set desired file creation properties...>
faplist_id = H5Pcreate (H5P_FILE_ACCESS)
<...set desired file access properties...>
file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL, fcplist_id, faplist_id)Opening an HDF5 file (read-only access)
faplist_id = H5Pcreate (H5P_FILE_ACCESS)
status = H5Pset_fapl_stdio (faplist_id)
file_id = H5Fopen ("SampleFile.h5", H5F_ACC_RDONLY, faplist_id)Closing an HDF5 file
status = H5Fclose (file_id)
Viewing a file with h5dump
Included with the HDF5 distribution is a command-line utility called h5dump. This is a program for inspecting the contents of a HDF5 file. It displays ASCII output formatted according to the HDF5 DDL grammar.
Displaying the content of file.h5:
h5dump SampleFile.h5
This is how the 'default' file will look, before any datasets or groups have been created, and no data has been written:
HDF5 "file.h5" {
GROUP "/" {
}
}
You can read more about the program h5dump here.
The HDF5 DDL grammar is described in this document.
File Function Summaries
- Table of general library functions, macros (H5)
- Table of file functions (H5F)
- File creation property list functions (H5P)
- File access property list functions (H5P)
- File driver functions (H5P)
File Property Lists
Additional information regarding file structure and access are passed to H5Fcreateand H5Fopenthrough property list objects. Property lists provide a portable and extensible method of modifying file properties via simple API functions. There are two kinds of file-related property lists:
- File creation property lists
- File access property lists
You can read more about file property lists in the HDF5 User Guide.
Code Examples
The following examples (for version 1.8.X) are taken from the HDF5 User Guide - Code Examples.
- Reading/writing a chunked dataset (Chunking refers to a storage layout where a dataset is partitioned into fixed-size multi-dimensional chunks.) [C - view] [C - download] [Fortran - view] [Fortran - download]
- Reading/writing a compact dataset [C - view] [C - download] [Fortran - view] [Fortran - download]
- Reading/writing an external dataset [C - view] [C - download] [Fortran - view] [Fortran - download]
Compiling
First, you must load the HDF5 module. This is done with
module load hdf5/<compiler>/<version>
You must specify compiler and version. You can see a list with
module avail hdf5
Then, you compile with
- h5cc hdf5_program.c (for C programs)
- h5c++ hdf5_program.cpp (for C++ programs)
- h5fc program.f90 (for Fortran 90 programs)
You can get more help with this by typing
module help hdf5/<compiler>/<version>
Examples, compiling, running
HDF5, version 1.8, Pathscale compiler.
C (h5ex_d_chunk.c - get from download link above)
p-bc9901 [~]$ module load hdf5/psc/1.8 p-bc9901 [~]$ h5cc h5ex_d_chunk.c -o h5ex_d_chunk p-bc9901 [~]$ ./h5ex_d_chunk Original Data: [ 1 1 1 1 1 1 1 1] [ 1 1 1 1 1 1 1 1] [ 1 1 1 1 1 1 1 1] [ 1 1 1 1 1 1 1 1] [ 1 1 1 1 1 1 1 1] [ 1 1 1 1 1 1 1 1] Storage layout for DS1 is: H5D_CHUNKED Data as written to disk by hyberslabs: [ 0 1 0 0 1 0 0 1] [ 1 1 0 1 1 0 1 1] [ 0 0 0 0 0 0 0 0] [ 0 1 0 0 1 0 0 1] [ 1 1 0 1 1 0 1 1] [ 0 0 0 0 0 0 0 0] Data as read from disk by hyperslab: [ 0 1 0 0 0 0 0 1] [ 0 1 0 1 0 0 1 1] [ 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0] [ 0 1 0 1 0 0 1 1] [ 0 0 0 0 0 0 0 0] p-bc9901 [~]$
Fortran 90 (h5ex_d_chunk.f90 - get from download link above)
p-bc9901 [~]$ module load hdf5/psc/1.8 p-bc9901 [~]$ h5fc h5ex_d_chunk.f90 -o h5ex_d_chunk p-bc9901 [~]$ ./h5ex_d_chunk Original Data: [ 1 1 1 1 1 1 1 1 ] [ 1 1 1 1 1 1 1 1 ] [ 1 1 1 1 1 1 1 1 ] [ 1 1 1 1 1 1 1 1 ] [ 1 1 1 1 1 1 1 1 ] [ 1 1 1 1 1 1 1 1 ] Storage layout for DS1 is: H5D_CHUNKED Data as written to disk by hyberslabs: [ 0 1 0 0 1 0 0 1 ] [ 1 1 0 1 1 0 1 1 ] [ 0 0 0 0 0 0 0 0 ] [ 0 1 0 0 1 0 0 1 ] [ 1 1 0 1 1 0 1 1 ] [ 0 0 0 0 0 0 0 0 ] Data as read from disk by hyperslab: [ 0 1 0 0 0 0 0 1 ] [ 0 1 0 1 0 0 1 1 ] [ 0 0 0 0 0 0 0 0 ] [ 0 0 0 0 0 0 0 0 ] [ 0 1 0 1 0 0 1 1 ] [ 0 0 0 0 0 0 0 0 ] p-bc9901 [~]$
Additional info
You can find more information at the following locations:
