December 2017
Beginner to intermediate
410 pages
12h 45m
English
In Chapter 14, we considered various ways to measure model performance. Section 14.4 described cross-validation, a technique that tries to measure model performance by looking at how it predicts on test data. This chapter explores regularization, one technique to improve performance on test data. Specifically, this method aims to prevent overfitting.
Let’s begin with a base case of linear regression. We will be using the ACS data.
import pandas as pd acs = pd.read_csv('../data/acs_ny.csv') print(acs.columns)
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren', 'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers', ...