Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

Part 3. Get confident: Using machine learning with PySpark

Parts 1 and 2 were all about data transformation, but we’re going to go above and beyond that by tackling scalable machine learning in part 3. While not a complete treatment of machine learning in itself, this part will give you the foundation to write your own ML programs in a robust and repeatable fashion.

Chapter 12 sets the stage for machine learning by building features, curated bits of information to use for the training process. Feature engineering itself is akin to purposeful data transformation. Get ready to use the skills learned in parts 1 and 2!

Chapter 13 introduces ML pipelines, Spark’s way to encapsulate ML workflows in a robust and repeatable way. Now, more importantly ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link