15.5 Case Study: Multiple Linear Regression with the California Housing Dataset

In Chapter 10’s Intro to Data Science section, we performed simple linear regression on a small weather data time series using pandas, Seaborn’s regplot function and the SciPy’s stats module’s linregress function. In the previous section, we reimplemented that same example using scikit-learn’s LinearRegression estimator, Seaborn’s scatterplot function and Matplotlib’s plot function. Now, we’ll perform linear regression with a much larger real-world dataset.

The California Housing dataset7 bundled with scikit-learn has 20,640 samples, each with eight numerical features. We’ll perform a multiple linear regression that uses all eight numerical features to make more ...

Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.