November 2017
Intermediate to advanced
374 pages
10h 19m
English
Here, we will use cross-validation on the diabetes dataset from the previous recipe to improve performance. Start by loading the dataset, as in the previous recipe:
%matplotlib inlineimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.datasets import load_diabetesdiabetes = load_diabetes()X = diabetes.datay = diabetes.targetX_feature_names = ['age', 'gender', 'body mass index', 'average blood pressure','bl_0','bl_1','bl_2','bl_3','bl_4','bl_5']bins = 50*np.arange(8)binned_y = np.digitize(y, bins)from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,stratify=binned_y)
Read now
Unlock full access