Parallel hdf5 writing for python3 (h5py)

Although python is a powerful tools which can link to other modules with trivial effort and understanding of the fundamental processing mechanism. However, the parallel writing hdf5 is not in the default package. That means if you want to write the same file at the same time with multiprocess, you have to compile the source code with the correct compiler. I spent several days to test and finished this notes in order to remind myself how to install it again.

I know reference is everywhere on the Internet, it is horrible to dig again and create the wheel again.

Parallel HDF5

Installed envorinment and target version

  • 5.4.14-100.fc30.x86_64
  • hdf5-1.10.6

Prerequirement (mpi; mpi4py)

  1. Install development tools (If you have problems for compiling files, you might need "Development Tools" and Libraries)

    sudo dnf groupinstall "Development Tools" "Development Libraries"

  2. Install **openmpi openmpi-devel ** from dnf.

    From the official document of installing parallel hdf5, it shows that we need MPI compiler with MPI-IO support. There are two options, openmpi or mpich.

    1
    2
    3
    $ sudo dnf install openmpi openmpi-devel
    $ module avail
    $ module load mpi/openmpi-x86_64
  3. Or, you can install mpich mpich-devel instead from dnf

    1
    2
    3
    $ sudo dnf install mpich mpich-devel
    $ module avail
    $ module load mpi/mpich-x86_64
  4. load the module Openmpi module (ref. fedroa version is 25)

  5. Install mpi4py from the source code! (You might face the compile problem after installing the mpi4py from pip3.[^1] )

    1
    2
    $ python3 setup.py build --mpicc=/where/you/have/mpicc
    $ python3 setup.py install
  6. Make share that mpi4py works!

    demo.py

    1
    2
    from mpi4py import MPI
    print("Hello World (from process %d)" % MPI.COMM_WORLD.Get_rank())

    $ mpiexec -n 2 python demo.py

Compile and install hdf5 from the source

  1. Download the source code from the HDF5 group website. Install it. [^2]

    Awareness: The source code that I downloaded is hdf5-1.10.6.tar.bz2. I could not compile the .zip file.

    Personal recommendation: Install this hdf5 externally. I don't know if the default repo installing (e.g. dnf) is different from the compiled one, so I installed another hdf5 out of the system lib.

    If it shows configure error c++ preprocessor /lib/cpp fails sanity check, add glibc-headers gcc-c++ from dnf.

1
2
3
4
$ CC=/usr/lib64/openmpi/bin/mpicc ./configure --enable-parallel -enable-shared --prefix=<install-directory>
$ make # build the library
$ make check # verify the correctness
$ make install
  1. Download the source code of h5py from github and go the the directory. Install it.

    1
    2
    3
    4
    $ export CC=mpicc 
    $ python3 setup.py configure --mpi --hdf5=/path/to/parallel/hdf5
    $ python3 setup.py build
    $ python3 setup.py install
  2. Make sure the package is workable[^3]

    Create demo2.py with the following content.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    from mpi4py import MPI
    import h5py

    rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run)

    f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)

    dset = f.create_dataset('test', (4,), dtype='i')
    dset[rank] = rank

    f.close()

    Execution the following commend; the number should depends the the core that your computer has.

    $ mpiexec -n 4 python3 demo2.py

    Check the output result

    $ h5dump parallel_test.hdf5

Unknown issue

  1. Why I cannot install with openmpi?

    Some of the answers said that I can change the slots with the following steps:

    1. In the terminal edit the default hostfile

      $sudo vim /etc/openmpi-x86_64/openmpi-default-hostfile

    2. Add the following content in the file

      localhost slots=2

    However, I still cannot get the wright result.

[^1]: Google discussion: about the mpi4py with different compile version.
[^2]: Installation instructions for Parallel HDF5.
[^3]: h5py testing.