Chapter 6. Storing Metadata with Attributes

Groups and datasets are great for keeping data organized in a file. But the feature that really turns HDF5 into a scientific database, instead of just a file format, is attributes.

Attributes are pieces of metadata you can stick on objects in the file. They can hold equipment settings, timestamps, computation results, version numbers, virtually anything you want. They’re a key mechanism for making self-describing files. Unlike simple binary formats that just hold arrays of numbers, judicious use of metadata makes your files scientifically useful all on their own.

Attribute Basics

You can attach attributes to any kind of object that is linked into the HDF5 tree structure: groups, datasets, and even named datatypes. To demonstrate, let’s create a new file containing a single dataset:

>>> f = h5py.File('attrsdemo.hdf5','w')
>>> dset = f.create_dataset('dataset',(100,))

Looking at the properties attached to the dset object, there’s one called .attrs:

>>> dset.attrs
<Attributes of HDF5 object at 73767504>

This is a little proxy object (an instance of h5py.AttributeManager) that lets you interact with attributes in a Pythonic way. As was the case with groups, the main thing to keep in mind here is that the attrs object works mostly like a Python dictionary.

For example, you can create a new attribute simply by assigning a name to a value:

>>> dset.attrs['title'] = "Dataset from third round of experiments"
>>> dset.attrs['sample_rate'] = 100e6    # 100 MHz ...

Get Python and HDF5 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.