12 Setting the stage: Preparing features for machine learning

This chapter covers

  • How investing in a solid data manipulation foundation makes data preparation a breeze
  • Addressing big data quality problems with PySpark
  • Creating custom features for your ML model
  • Selecting compelling features for your model
  • Using transformers and estimators as part of the feature engineering process

I get excited doing machine learning, but not for the reasons most people do. I love getting into a new data set and trying to solve a problem. Each data set sports its own problems and idiosyncrasies, and getting it “ML ready” is extremely satisfying. Building a model gives purpose to data transformation; you ingest, clean, profile, and torture the data for a higher ...

Get Data Analysis with Python and PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.