Skip to Main Content
Python and HDF5
book

Python and HDF5

by Andrew Collette
November 2013
Intermediate to advanced content levelIntermediate to advanced
148 pages
3h 21m
English
O'Reilly Media, Inc.
Content preview from Python and HDF5

Chapter 3. Working with Datasets

Datasets are the central feature of HDF5. You can think of them as NumPy arrays that live on disk. Every dataset in HDF5 has a name, a type, and a shape, and supports random access. Unlike the built-in np.save and friends, there’s no need to read and write the entire array as a block; you can use the standard NumPy syntax for slicing to read and write just the parts you want.

Dataset Basics

First, let’s create a file so we have somewhere to store our datasets:

>>> f = h5py.File("testfile.hdf5")

Every dataset in an HDF5 file has a name. Let’s see what happens if we just assign a new NumPy array to a name in the file:

>>> arr = np.ones((5,2))
>>> f["my dataset"] = arr
>>> dset = f["my dataset"]
>>> dset
<HDF5 dataset "my dataset": shape (5, 2), type "<f8">

We put in a NumPy array but got back something else: an instance of the class h5py.Dataset. This is a “proxy” object that lets you read and write to the underlying HDF5 dataset on disk.

Type and Shape

Let’s explore the Dataset object. If you’re using IPython, type dset. and hit Tab to see the object’s attributes; otherwise, do dir(dset). There are a lot, but a few stand out:

>>> dset.dtype
dtype('float64')

Each dataset has a fixed type that is defined when it’s created and can never be changed. HDF5 has a vast, expressive type mechanism that can easily handle the built-in NumPy types, with few exceptions. For this reason, h5py always expresses the type of a dataset using standard NumPy dtype objects.

There’s ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python One-Liners

Python One-Liners

Christian Mayer
Scaling Python with Dask

Scaling Python with Dask

Holden Karau, Mika Kimmins
Robust Python

Robust Python

Patrick Viafore
Python Cookbook

Python Cookbook

Alex Martelli, David Ascher

Publisher Resources

ISBN: 9781491944981Errata Page