© Ramcharan Kakarla, Sundar Krishnan and Sridhar Alla 2021
R. Kakarla et al.Applied Data Science Using PySparkhttps://doi.org/10.1007/978-1-4842-6500-0_2

2. PySpark Basics

Ramcharan Kakarla1  , Sundar Krishnan1 and Sridhar Alla2
Philadelphia, PA, USA
New Jersey, NJ, USA

This chapter will help you understand the basic operations of PySpark. You are encouraged to set up the PySpark environment and try the following operations on any dataset of your choice for enhanced understanding. Since Spark itself is a very big topic, we will give you just enough content to get you started with PySpark basics and concepts before jumping into data-wrangling activities. This chapter will demonstrate the most common data operations in PySpark that you may encounter ...

Get Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.