Skip to Content
Python Data Analysis Cookbook
book

Python Data Analysis Cookbook

by Ivan Idris
July 2016
Beginner to intermediate
462 pages
9h 14m
English
Packt Publishing
Content preview from Python Data Analysis Cookbook

Recursively eliminating features

If we have many features (explanatory variables), it is tempting to include them all in our model. However, we then run the risk of overfitting—getting a model that works very well for the training data and very badly for unseen data. Not only that, but the model is bound to be relatively slow and require a lot of memory. We have to weigh accuracy (or an other metric) against speed and memory requirements.

We can try to ignore features or create new better compound features. For instance, in online advertising, it is common to work with ratios, such as the ratio of views and clicks related to an ad. Common sense or domain knowledge can help us select features. In the worst-case scenario, we may have to rely on correlations ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Machine Learning Cookbook - Second Edition

Python Machine Learning Cookbook - Second Edition

Giuseppe Ciaburro, Prateek Joshi
Python: End-to-end Data Analysis

Python: End-to-end Data Analysis

Phuong Vothihong, Martin Czygan, Ivan Idris, Magnus Vilhelm Persson, Luiz Felipe Martins
Python Data Science Essentials - Third Edition

Python Data Science Essentials - Third Edition

Alberto Boschetti, Luca Massaron, Pietro Marinelli, Matteo Malosetti

Publisher Resources

ISBN: 9781785282287Supplemental Content