Chapter 1. Laying the Foundation for Reproducible Data Analysis

In this chapter, we will cover the following recipes:

  • Setting up Anaconda
  • Installing the Data Science Toolbox
  • Creating a virtual environment with virtualenv and virtualenvwrapper
  • Sandboxing Python applications with Docker images
  • Keeping track of package versions and history in IPython Notebooks
  • Configuring IPython
  • Learning to log for robust error checking
  • Unit testing your code
  • Configuring pandas
  • Configuring matplotlib
  • Seeding random number generators and NumPy print options
  • Standardizing reports, code style, and data access

Introduction

Reproducible data analysis is a cornerstone of good science. In today's rapidly evolving world of science and technology, reproducibility is a hot topic. Reproducibility ...

Get Python: End-to-end Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.