Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate content levelBeginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

Part 1. Get acquainted: First steps in PySpark

When working with a new technology, the best way to get familiar with it is to jump right in, building our intuition along the way. This first part succinctly introduces PySpark before going over two distinct use cases.

Chapter 1 introduces the technology and the computing model that power Spark.

Then, in chapters 2 and 3, we build a simple end-to-end program and learn how to structure PySpark code in a readable and intuitive fashion. We go from the data ingestion of text data to processing, to the presentation of the results, and, finally, to submitting the program in a noninteractive fashion.

Chapters 4 and 5 look at working with tabular data, the most frequently used type of data. We build on ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link