© Ramcharan Kakarla, Sundar Krishnan and Sridhar Alla 2021
R. Kakarla et al.Applied Data Science Using PySparkhttps://doi.org/10.1007/978-1-4842-6500-0_1

1. Setting Up the PySpark Environment

Ramcharan Kakarla1  , Sundar Krishnan1 and Sridhar Alla2
Philadelphia, PA, USA
New Jersey, NJ, USA

The goal of this chapter is to quickly get you set up with the PySpark environment. There are multiple options discussed, so it is up to the reader to pick their favorite. Folks who already have the environment ready can skip to the “Basic Operations” section later in this chapter.

In this chapter, we will cover the following topics:
  • Local installation using Anaconda

  • Docker-based installation

  • Databricks community edition

Local Installation using Anaconda

Step 1: ...

Get Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.