Standardization and normalization

Up until now, we have dealt with identifying the types of data as well as the ways data can be missing and finally, the ways we can fill in missing data. Now, let's talk about how we can manipulate our data (and our features) in order to enhance our machine pipelines further. So far, we have tried four different ways of manipulating our dataset, and the best cross-validated accuracy we have achieved with a KNN model is .745. If we look back at some of the EDA we have previously done, we will notice something about our features:

impute = Imputer(strategy='mean')# we will want to fill in missing values to see all 9 columnspima_imputed_mean = pd.DataFrame(impute.fit_transform(pima), columns=pima_column_names) ...

Get Feature Engineering Made Easy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.