Chapter 5. Big Data

 

"More is different."

 
 --Philip Warren Anderson

In the previous chapters, we've used regression techniques to fit models to the data. In Chapter 3, Correlation, for example, we built a linear model that used ordinary least squares and the normal equation to fit a straight line through the athletes' heights and log weights. In Chapter 4, Classification, we used Incanter's optimize namespace to minimize the logistic cost function and build a classifier of Titanic's passengers. In this chapter, we'll apply similar analysis in a way that's suitable for much larger quantities of data.

We'll be working with a relatively modest dataset of only 100,000 records. This isn't big data (at 100 MB, it will fit comfortably in the memory of ...

Get Clojure for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.