Chapter 3. Probability Mass Functions

The code for this chapter is in probability.py. For information about downloading and working with this code, see Using the Code.

Pmfs

Another way to represent a distribution is a probability mass function (PMF), which maps from each value to its probability. A probability is a frequency expressed as a fraction of the sample size, n. To get from frequencies to probabilities, we divide through by n, which is called normalization.

Given a Hist, we can make a dictionary that maps from each value to its probability:

n = hist.Total()
d = {}
for x, freq in hist.Items():
    d[x] = freq / n

Or we can use the Pmf class provided by thinkstats2. Like Hist, the Pmf constructor can take a list, pandas Series, dictionary, Hist, or another Pmf object. Here’s an example with a simple list:

>>> import thinkstats2
>>> pmf = thinkstats2.Pmf([1, 2, 2, 3, 5])
>>> pmf
Pmf({1: 0.2, 2: 0.4, 3: 0.2, 5: 0.2})

The Pmf is normalized so total probability is 1.

Pmf and Hist objects are similar in many ways; in fact, they inherit many of their methods from a common parent class. For example, the methods Values and Items work the same way for both. The biggest difference is that a Hist maps from values to integer counters; a Pmf maps from values to floating-point probabilities.

To look up the probability associated with a value, use Prob:

>>> pmf.Prob(2)
0.4

The bracket operator is equivalent:

>>> pmf[2]
0.4

You can modify an existing Pmf by incrementing the probability associated with a ...

Get Think Stats, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.