O'Reilly logo

Python and HDF5 by Andrew Collette

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Working with Datasets

Datasets are the central feature of HDF5. You can think of them as NumPy arrays that live on disk. Every dataset in HDF5 has a name, a type, and a shape, and supports random access. Unlike the built-in np.save and friends, there’s no need to read and write the entire array as a block; you can use the standard NumPy syntax for slicing to read and write just the parts you want.

Dataset Basics

First, let’s create a file so we have somewhere to store our datasets:

>>> f = h5py.File("testfile.hdf5")

Every dataset in an HDF5 file has a name. Let’s see what happens if we just assign a new NumPy array to a name in the file:

>>> arr = np.ones((5,2))
>>> f["my dataset"] = arr
>>> dset = f["my dataset"]
>>> dset
<HDF5 dataset "my dataset": shape (5, 2), type "<f8">

We put in a NumPy array but got back something else: an instance of the class h5py.Dataset. This is a “proxy” object that lets you read and write to the underlying HDF5 dataset on disk.

Type and Shape

Let’s explore the Dataset object. If you’re using IPython, type dset. and hit Tab to see the object’s attributes; otherwise, do dir(dset). There are a lot, but a few stand out:

>>> dset.dtype
dtype('float64')

Each dataset has a fixed type that is defined when it’s created and can never be changed. HDF5 has a vast, expressive type mechanism that can easily handle the built-in NumPy types, with few exceptions. For this reason, h5py always expresses the type of a dataset using standard NumPy dtype objects.

There’s ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required