CHAPTER 5 Survival Analysis

Survival analysis is a set of statistical techniques focusing on the occurrence and timing of events.1 As the name suggests, it originates from a medical context where it was used to study survival times of patients that had received certain treatments. In fact, many classification analytics problems we have discussed before also have a time aspect included, which can be analyzed using survival analysis techniques. Some examples are:2

  • Predict when customers churn
  • Predict when customers make their next purchase
  • Predict when customers default
  • Predict when customers pay off their loan early
  • Predict when customer will visit a website next

Two typical problems complicate the usage of classical statistical techniques such as linear regression. A first key problem is censoring. Censoring refers to the fact that the target time variable is not always known because not all customers may have undergone the event yet at the time of the analysis. Consider, for example, the example depicted in Figure 5.1. At time T, Laura and John have not churned yet and thus have no value for the target time indicator. The only information available is that they will churn at some later date after T. Note also that Sophie is censored at the time she moved to Australia. In fact, these are all examples of right censoring. An observation on a variable T is right censored if all you know about T is that it is greater than some value c. Likewise, an observation on a variable

Get Analytics in a Big Data World: The Essential Guide to Data Science and its Applications now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.