Chapter 12. Time Series Analysis

A time series is a sequence of measurements from a system that varies in time. One famous example is the “hockey stick graph” that shows global average temperature over time.

The example I work with in this chapter comes from Zachary M. Jones, a researcher in political science who studies the black market for cannabis in the US. He collected data from a website called Price of Weed that crowdsources market information by asking participants to report the price, quantity, quality, and location of cannabis transactions. The goal of his project is to investigate the effect of policy decisions, like legalization, on markets. I find this project appealing because it is an example that uses data to address important political questions, like drug policy.

I hope you will find this chapter interesting, but I’ll take this opportunity to reiterate the importance of maintaining a professional attitude to data analysis. Whether and which drugs should be illegal are important and difficult public policy questions; our decisions should be informed by accurate data reported honestly.

The code for this chapter is in timeseries.py. For information about downloading and working with this code, see Using the Code.

Importing and Cleaning

The data I downloaded from Mr. Jones’s site is in the repository for this book. The following code reads it into a pandas DataFrame:

    transactions = pandas.read_csv('mj-clean.csv', parse_dates=[5])

parse_dates tells read_csv to interpret values ...

Get Think Stats, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.