Andrew Collette

Managing Large Datasets with Python and HDF5

Date: This event took place live on January 28 2014

Presented by: Andrew Collette

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to


Are you using Python to process large numerical datasets? Over the past few years, the Hierarchical Data Format (HDF5) has emerged as the mechanism of choice for processing, archiving and sharing scientific datasets ranging from gigabytes to terabytes and beyond. With a diverse user base spanning the range from NASA to the financial industry, HDF5 lets you create high-performance, portable, self-describing containers for your data. HDF5's flexibility and speed make it particularly well-suited to analysis in Python.

This webcast provides a practical, Python-based introduction to the world of HDF5.

This webcast led by Andrew Collette will cover:

  • The basics of the format
  • Performance
  • Best practices for making sharable data files which can be read by colleagues on other platforms

About Andrew Collette

Andrew Collette holds a Ph.D. in physics from UCLA, and works as a laboratory research scientist at the University of Colorado. He has worked with the Python-NumPy-HDF5 stack at two multimillion-dollar research facilities; the first being the Large Plasma Device at UCLA (entirely standardized on HDF5), and the second being the hypervelocity dust accelerator at the Colorado Center for Lunar Dust and Atmospheric Studies, University of Colorado at Boulder. Additionally, Dr. Collette is a leading developer of the HDF5 for Python (h5py) project.

You may also be interested in:

Strata Conference