Skip to Main Content
Think Stats, 2nd Edition
book

Think Stats, 2nd Edition

by Allen B. Downey
October 2014
Beginner content levelBeginner
226 pages
5h 42m
English
O'Reilly Media, Inc.
Content preview from Think Stats, 2nd Edition

Chapter 2. Distributions

One of the best ways to describe a variable is to report the values that appear in the dataset and how many times each value appears. This description is called the distribution of the variable.

The most common representation of a distribution is a histogram, which is a graph that shows the frequency of each value. In this context, “frequency” means the number of times the value appears.

In Python, an efficient way to compute frequencies is with a dictionary. Given a sequence of values, t:

hist = {}
for x in t:
    hist[x] = hist.get(x, 0) + 1

The result is a dictionary that maps from values to frequencies. Alternatively, you could use the Counter class defined in the collections module:

from collections import Counter
counter = Counter(t)

The result is a Counter object, which is a subclass of dictionary.

Another option is to use the pandas method value_counts, which we saw in the previous chapter. But for this book I created a class, Hist, that represents histograms and provides the methods that operate on them.

Representing Histograms

The Hist constructor can take a sequence, dictionary, pandas Series, or another Hist. You can instantiate a Hist object like this:

>>> import thinkstats2
>>> hist = thinkstats2.Hist([1, 2, 2, 3, 5])
>>> hist
Hist({1: 1, 2: 2, 3: 1, 5: 1})

Hist objects provide Freq, which takes a value and returns its frequency:

>>> hist.Freq(2)
2

The bracket operator does the same thing:

>>> hist[2]
2

If you look up a value that has never appeared, the frequency ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Think Bayes, 2nd Edition

Think Bayes, 2nd Edition

Allen B. Downey
Practical Tableau

Practical Tableau

Ryan Sleeper

Publisher Resources

ISBN: 9781491907344Errata