Chapter 4. Cohort Analysis
In Chapter 3 we covered time series analysis. With those techniques in hand, we will now turn to a related type of analysis with many business and other applications: cohort analysis.
I remember the first time I encountered a cohort analysis. I was working at my first data analyst job, at a small startup. I was reviewing a purchase analysis I’d worked on with the CEO, and he suggested that I break up the customer base by cohorts to see whether behavior was changing over time. I assumed it was some fancy business school thing and probably useless, but he was the CEO, so of course I humored him. Turns out it wasn’t just a lark. Breaking populations into cohorts and following them over time is a powerful way to analyze your data and avoid various biases. Cohorts can provide clues to how subpopulations differ from each other and how they change over time.
In this chapter, we’ll first take a look at what cohorts are and at the building blocks of certain types of cohort analysis. After an introduction to the legislators data set used for the examples, we’ll learn how to construct a retention analysis and deal with various challenges such as defining the cohort and handling sparse data. Next, we’ll cover survivorship, returnship, and cumulative calculations, all of which are similar to retention analysis in the way the SQL code is structured. Finally, we’ll look at how to combine cohort analysis with cross-sectional analysis to understand the makeup of populations ...