©  Raju Kumar Mishra 2018
Raju Kumar MishraPySpark Recipeshttps://doi.org/10.1007/978-1-4842-3141-8_2

2. Installation

Raju Kumar Mishra1 
Bangalore, Karnataka, India

In the upcoming chapters, we are going to solve many problems by using PySpark . PySpark also interacts with many other big data frameworks to provide end-to-end solutions. PySpark might read data from HDFS , NoSQL databases , or a relational database management system (RDBMS) . After data analysis, we can also save the results into HDFS or databases.

This chapter covers all the software installations that are required to go through this book. We are going to install all the required big data frameworks on the CentOS operating system . CentOS is an enterprise-class operating system. ...

Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.