© Ramcharan Kakarla, Sundar Krishnan and Sridhar Alla 2021
R. Kakarla et al.Applied Data Science Using PySparkhttps://doi.org/10.1007/978-1-4842-6500-0_1

1. Setting Up the PySpark Environment

Ramcharan Kakarla1  , Sundar Krishnan1 and Sridhar Alla2
(1)
Philadelphia, PA, USA
(2)
New Jersey, NJ, USA
 

The goal of this chapter is to quickly get you set up with the PySpark environment. There are multiple options discussed, so it is up to the reader to pick their favorite. Folks who already have the environment ready can skip to the “Basic Operations” section later in this chapter.

In this chapter, we will cover the following topics:
  • Local installation using Anaconda

  • Docker-based installation

  • Databricks community edition

Local Installation using Anaconda

Step 1: ...

Get Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.