Manipulating large arrays with HDF5 and PyTables

NumPy arrays can be persistently saved on disk using built-in functions in NumPy such as np.savetxt,, or np.savez, and loaded in memory using analogous functions. These methods are best when the arrays contain less than a few million points. For larger arrays, these methods suffer from two major problems: they become too slow, and they require the arrays to be fully loaded in memory. Arrays containing billions of points can be too big to fit in system memory, and alternative methods are required.

These alternative methods rely on memory mapping: the array resides on the hard drive, and chunks of the array are selectively loaded in memory as soon as the CPU needs them. This technique is memory-efficient, ...

Get IPython Interactive Computing and Visualization Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.