Setting up your local Spark instance

Making a full installation of Apache Spark is not an easy task to do from scratch. This is usually accomplished on a cluster of computers, often accessible on the cloud, and it is delegated to experts of the technology (namely, data engineers). This could be a limitation, because you may then not have access to an environment in which to test what you will be learning in this chapter.

However, in order to test the contents of this chapter, you actually do not need to make too-complex installations. By using Docker (https://www.docker.com/), you can have access to an installation of Spark, together with a Jupyter notebook and PySpark, on a Linux server on your own computer (it does not matter if it is a ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.