Chapter 5. Big Data

 

"More is different."

 
 --Philip Warren Anderson

In the previous chapters, we've used regression techniques to fit models to the data. In Chapter 3, Correlation, for example, we built a linear model that used ordinary least squares and the normal equation to fit a straight line through the athletes' heights and log weights. In Chapter 4, Classification, we used Incanter's optimize namespace to minimize the logistic cost function and build a classifier of Titanic's passengers. In this chapter, we'll apply similar analysis in a way that's suitable for much larger quantities of data.

We'll be working with a relatively modest dataset of only 100,000 records. This isn't big data (at 100 MB, it will fit comfortably in the memory of ...

Get Clojure for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.