Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

Part 2. Get proficient: Translate your ideas into code

With two different kind of programs under your belt, it’s time to expand our horizons. Part 2 is about diversifying your set of tools so that no data set will have a secret for you.

Chapter 6 breaks the rows and columns mold to go multidimensional. Through JSON data, we build data frames that contain data frames themselves. This tool catapults the versatility of the Spark data frame to completely new horizons.

Chapter 7 introduces PySpark and SQL together. Together, they unlock a new level of expressiveness and succinctness in your code, allow you to scale SQL workflows at record speed, and provide a new way to reason about your analyses.

Chapters 8 and 9 cover going full Python with your ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link