Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

Appendix B. Installing PySpark

This appendix covers the installation of standalone Spark and PySpark on your own computer, whether it’s running Windows, macOS, or Linux. I also briefly cover cloud offerings, should you want to easily take advantage of PySpark’s distributed nature.

Having a local PySpark cluster means that you’ll be able to experiment with the syntax using smaller data sets. You don’t have to acquire multiple computers or spend money on managed PySpark on the cloud until you’re ready to scale your programs. Once you’re ready to work on a larger data set, you can easily transfer your program to a cloud instance of Spark for additional power.

B.1 Installing PySpark on your local machine

This section covers installing Spark and Python ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link