15.5 Case Study: Multiple Linear Regression with the California Housing Dataset
In Chapter 10’s Intro to Data Science section, we performed simple linear regression on a small weather data time series using pandas, Seaborn’s regplot
function and the SciPy’s stats
module’s linregress
function. In the previous section, we reimplemented that same example using scikit-learn’s LinearRegression
estimator, Seaborn’s scatterplot
function and Matplotlib’s plot
function. Now, we’ll perform linear regression with a much larger real-world dataset.
The California Housing dataset7 bundled with scikit-learn has 20,640 samples, each with eight numerical features. We’ll perform a multiple linear regression that uses all eight numerical features to make more ...
Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.