Chapter 13. Survival Analysis
Survival analysis is a way to describe how long things last. It is often used to study human lifetimes, but it also applies to “survival” of mechanical and electronic components, or more generally to intervals in time before an event.
If someone you know has been diagnosed with a life-threatening disease, you might have seen a “5-year survival rate,” which is the probability of surviving five years after diagnosis. That estimate and related statistics are the result of survival analysis.
The code in this chapter is in survival.py. For information about downloading and
working with this code, see Using the Code.
Survival Curves
The fundamental concept in survival analysis is the survival
curve,
, which is a function that maps from a duration,
t, to the probability of surviving longer
than t. If you know the distribution of
durations, or “lifetimes”, finding the survival curve is easy; it’s just
the complement of the CDF:

where
is the probability of a lifetime less than or equal to
t.
For example, in the NSFG dataset, we know the duration of 11189 complete pregnancies. We can read this data and compute the CDF:
preg = nsfg.ReadFemPreg() complete = preg.query('outcome ...