Feature Engineering

People come to me as a data scientist with their data. Then my job becomes part data-hazmat officer, part grief counselor.


Chapter 6, Value Imputation looked at filling in missing values. In Chapter 5, Data Quality, we touched on normalization and scaling, which adjust values to artificially fit certain numeric or categorical patterns. Both of those earlier topics come close to the subject of this chapter, but here we focus more directly on the creation of synthetic features based on raw datasets. Whereas imputation is a matter of making reasonable guesses about what missing values might be, feature engineering is about changing the representational form of data, but in ways that are deterministic and often ...

Get Cleaning Data for Effective Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.