Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

12 Setting the stage: Preparing features for machine learning

This chapter covers

  • How investing in a solid data manipulation foundation makes data preparation a breeze
  • Addressing big data quality problems with PySpark
  • Creating custom features for your ML model
  • Selecting compelling features for your model
  • Using transformers and estimators as part of the feature engineering process

I get excited doing machine learning, but not for the reasons most people do. I love getting into a new data set and trying to solve a problem. Each data set sports its own problems and idiosyncrasies, and getting it “ML ready” is extremely satisfying. Building a model gives purpose to data transformation; you ingest, clean, profile, and torture the data for a higher ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link