## Chapter 3. Probability Mass Functions

The code for this chapter is in `probability.py`

. For information about downloading
and working with this code, see Using the Code.

## Pmfs

Another way to represent a distribution is a **probability mass function** (PMF), which maps from
each value to its probability. A **probability** is a frequency expressed as a fraction
of the sample size, `n`

. To get from
frequencies to probabilities, we divide through by `n`

, which is called **normalization**.

Given a Hist, we can make a dictionary that maps from each value to its probability:

n = hist.Total() d = {} for x, freq in hist.Items(): d[x] = freq / n

Or we can use the Pmf class provided by `thinkstats2`

. Like Hist, the Pmf constructor can
take a list, pandas Series, dictionary, Hist, or another Pmf object.
Here’s an example with a simple list:

>>> import thinkstats2 >>> pmf = thinkstats2.Pmf([1, 2, 2, 3, 5]) >>> pmf Pmf({1: 0.2, 2: 0.4, 3: 0.2, 5: 0.2})

The Pmf is normalized so total probability is 1.

Pmf and Hist objects are similar in many ways; in fact, they inherit
many of their methods from a common parent class. For example, the methods
`Values`

and `Items`

work the same way for both. The biggest
difference is that a Hist maps from values to integer counters; a Pmf maps
from values to floating-point probabilities.

To look up the probability associated with a value, use `Prob`

:

>>> pmf.Prob(2) 0.4

The bracket operator is equivalent:

>>> pmf[2] 0.4

You can modify an existing Pmf by incrementing the probability associated with a ...

Get *Think Stats, 2nd Edition* now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.