Chapter 8. Organizing Data with References, Types, and Dimension Scales

Your files aren’t just a collection of groups, datasets, and attributes. Some of the best features in HDF5 are those that help you to express relationships between pieces of your data.

Maybe one of your datasets provides the x-axis for another, and you’d like to express that in a way your colleagues can easily figure out. Maybe you want to record which regions of a particular dataset are of interest for further processing. Or maybe you just want to store a bunch of links to datasets and groups in the file, without having to worry about getting all the paths right.

This chapter covers three of the most useful constructs in HDF5 for linking your various objects together into a scientifically useful whole. References, the HDF5 “pointer” type, are a great way to store links to objects as data. Named types let you enforce type consistency across datasets. And Dimension Scales, an HDF5 standard, let you attach physically meaningful axes to your data in a way third-party programs can understand.

Let’s get started with the simplest relational widget in HDF5: object references.

Object References

We’ve already seen how links in a group serve to locate objects. But there’s another mechanism that can do this, and crucially, this kind can be stored as data in things like attributes and datasets.

Creating and Resolving References

Let’s create a simple file with a couple of groups and a dataset:

>>> f = h5py.File('refs_demo.hdf5' ...

Get Python and HDF5 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.