Chapter 3. Probability Mass Functions
The code for this chapter is in probability.py
. For information about downloading
and working with this code, see Using the Code.
Pmfs
Another way to represent a distribution is a probability mass function (PMF), which maps from
each value to its probability. A probability is a frequency expressed as a fraction
of the sample size, n
. To get from
frequencies to probabilities, we divide through by n
, which is called normalization.
Given a Hist, we can make a dictionary that maps from each value to its probability:
n = hist.Total() d = {} for x, freq in hist.Items(): d[x] = freq / n
Or we can use the Pmf class provided by thinkstats2
. Like Hist, the Pmf constructor can
take a list, pandas Series, dictionary, Hist, or another Pmf object.
Here’s an example with a simple list:
>>> import thinkstats2 >>> pmf = thinkstats2.Pmf([1, 2, 2, 3, 5]) >>> pmf Pmf({1: 0.2, 2: 0.4, 3: 0.2, 5: 0.2})
The Pmf is normalized so total probability is 1.
Pmf and Hist objects are similar in many ways; in fact, they inherit
many of their methods from a common parent class. For example, the methods
Values
and Items
work the same way for both. The biggest
difference is that a Hist maps from values to integer counters; a Pmf maps
from values to floating-point probabilities.
To look up the probability associated with a value, use Prob
:
>>> pmf.Prob(2) 0.4
The bracket operator is equivalent:
>>> pmf[2] 0.4
You can modify an existing Pmf by incrementing the probability associated with a ...
Get Think Stats, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.