Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate content levelBeginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

front matter

preface

While computers have been getting more powerful and more capable of chewing though larger data sets, our appetite for consuming data grows much faster. Consequently, we built new tools to scale big data jobs across multiple machines. This does not come for free, and early tools were complicated by requiring users to manage not only the data program, but also the health and performance of the cluster of machines themselves. I recall trying to scale my own programs, only to be faced with the advice to “just sample your data set and get on with your day.”

PySpark changes the game. Starting with the popular Python programming language, it provides a clear and readable API to manipulate very large data sets. Still, while in the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link