Chapter 3. Probability Mass Functions
The code for this chapter is in probability.py
. For information about downloading
and working with this code, see Using the Code.
Pmfs
Another way to represent a distribution is a probability mass function (PMF), which maps from
each value to its probability. A probability is a frequency expressed as a fraction
of the sample size, n
. To get from
frequencies to probabilities, we divide through by n
, which is called normalization.
Given a Hist, we can make a dictionary that maps from each value to its probability:
n = hist.Total() d = {} for x, freq in hist.Items(): d[x] = freq / n
Or we can use the Pmf class provided by thinkstats2
. Like Hist, the Pmf constructor can
take a list, pandas Series, dictionary, Hist, or another Pmf object.
Here’s an example with a simple list:
>>> import thinkstats2 >>> pmf = thinkstats2.Pmf([1, 2, 2, 3, 5]) >>> pmf Pmf({1: 0.2, 2: 0.4, 3: 0.2, 5: 0.2})
The Pmf is normalized so total probability is 1.
Pmf and Hist objects are similar in many ways; in fact, they inherit
many of their methods from a common parent class. For example, the methods
Values
and Items
work the same way for both. The biggest
difference is that a Hist maps from values to integer counters; a Pmf maps
from values to floating-point probabilities.
To look up the probability associated with a value, use Prob
:
>>> pmf.Prob(2) 0.4
The bracket operator is equivalent:
>>> pmf[2] 0.4
You can modify an existing Pmf by incrementing the probability associated with a ...
Get Think Stats, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.