Chapter 5. Big Data
"More is different."
|--Philip Warren Anderson|
In the previous chapters, we've used regression techniques to fit models to the data. In Chapter 3, Correlation, for example, we built a linear model that used ordinary least squares and the normal equation to fit a straight line through the athletes' heights and log weights. In Chapter 4, Classification, we used Incanter's optimize namespace to minimize the logistic cost function and build a classifier of Titanic's passengers. In this chapter, we'll apply similar analysis in a way that's suitable for much larger quantities of data.
We'll be working with a relatively modest dataset of only 100,000 records. This isn't big data (at 100 MB, it will fit comfortably in the memory of ...