Chapter 4. Cumulative Distribution Functions
The code for this chapter is in cumulative.py. For information about downloading
and working with this code, see Using the Code.
The Limits of PMFs
PMFs work well if the number of values is small. But as the number of values increases, the probability associated with each value gets smaller and the effect of random noise increases.
For example, we might be interested in the distribution of birth
weights. In the NSFG data, the variable totalwgt_lb records weight at birth in pounds.
Figure 4-1 shows the PMF of these values for
first babies and others.

Overall, these distributions resemble the bell shape of a normal distribution, with many values near the mean and a few values much higher and lower.
But parts of this figure are hard to interpret. There are many spikes and valleys, and some apparent differences between the distributions. It is hard to tell which of these features are meaningful. Also, it is hard to see overall patterns; for example, which distribution do you think has the higher mean?
These problems can be mitigated by binning the data; that is, dividing the range of values into non-overlapping intervals and counting the number of values in each bin. Binning can be useful, but it is tricky to get the size of the bins right. If they are ...