Chapter 14. Regression
Regression is a supervised machine learning process. It is similar to classification, but rather than predicting a label, we try to predict a continuous value. If you are trying to predict a number, then use regression.
It turns out that sklearn supports many of the
same classification models for regression problems. In fact, the
API is the same, calling .fit, .score, and .predict. This is also true for the next-generation boosting libraries, XGBoost and LightGBM.
Though there are similarities with the classification models and hyperparameters, the evaluation metrics are different for regression. This chapter will review many of the types of regression models. We will use the Boston housing dataset to explore them.
Here we load the data, create a split version for training and testing, and create another split version with standardized data:
>>>importpandasaspd>>>fromsklearn.datasetsimportload_boston>>>fromsklearnimport(...model_selection,...preprocessing,...)>>>b=load_boston()>>>bos_X=pd.DataFrame(...b.data,columns=b.feature_names...)>>>bos_y=b.target>>>bos_X_train,bos_X_test,bos_y_train,bos_y_test=model_selection.train_test_split(...bos_X,...bos_y,...test_size=0.3,...random_state=42,...)>>>bos_sX=preprocessing.StandardScaler().fit_transform(...bos_X...)>>>bos_sX_train,bos_sX_test,bos_sy_train,bos_sy_test=model_selection.train_test_split(...bos_sX,...bos_y,...test_size ...