In the upcoming chapters, we are going to solve many problems by using PySpark . PySpark also interacts with many other big data frameworks to provide end-to-end solutions. PySpark might read data from HDFS , NoSQL databases , or a relational database management system (RDBMS) . After data analysis, we can also save the results into HDFS or databases.
This chapter covers all the software installations that are required to go through this book. We are going to install all the required big data frameworks on the CentOS operating system . CentOS is an enterprise-class operating system. ...