Chapter 1. Laying the Foundation for Reproducible Data Analysis

In this chapter, we will cover the following recipes:

  • Setting up Anaconda
  • Installing the Data Science Toolbox
  • Creating a virtual environment with virtualenv and virtualenvwrapper
  • Sandboxing Python applications with Docker images
  • Keeping track of package versions and history in IPython Notebooks
  • Configuring IPython
  • Learning to log for robust error checking
  • Unit testing your code
  • Configuring pandas
  • Configuring matplotlib
  • Seeding random number generators and NumPy print options
  • Standardizing reports, code style, and data access


Reproducible data analysis is a cornerstone of good science. In today's rapidly evolving world of science and technology, reproducibility is a hot topic. Reproducibility ...

Get Python: End-to-end Data Analysis now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.