CHAPTER 6Multiple Linear Regression
In this chapter, we introduce linear regression models for the purpose of prediction. We discuss the differences between fitting and using regression models for the purpose of inference (as in classical statistics) and for prediction. A predictive goal calls for evaluating model performance on a validation set, and for using predictive metrics. We then raise the challenges of using many predictors and describe variable selection algorithms that are often implemented in linear regression procedures.
Python
In this chapter, we will use pandas for data handling, and scikit-learn for building the models, and variable (feature) selection. We will also make use of the utility functions from the Python Utilities Functions Appendix. We could also use statsmodels for the linear regression models, however, statsmodels provides more information than needed for predictive modeling. Use the following import statements for the Python code in this chapter.
import required functionality for this chapter
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression, Lasso, Ridge, LassoCV, BayesianRidge import statsmodels.formula.api as sm import matplotlib.pylab as plt from dmba import regressionSummary, exhaustive_search from dmba import backward_elimination, forward_selection, stepwise_selection ...
Get Data Mining for Business Analytics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.