Skip to Content
Mastering Apache Cassandra 3.x - Third Edition
book

Mastering Apache Cassandra 3.x - Third Edition

by Aaron Ploetz, Tejaswi Malepati
October 2018
Beginner to intermediate content levelBeginner to intermediate
348 pages
10h
English
Packt Publishing
Content preview from Mastering Apache Cassandra 3.x - Third Edition

PYSpark through Juypter

If Spark is already installed on the machine and SPARK_HOME is set, then the findspark pip package will get information related to the installed Spark. It will then connect Jupyter to the Spark installation with this package, which needs to be installed as follows:

pip install findspark

Otherwise, pip would not have the PySpark package installed by default. Hence, for using PySpark through Jupyter, it is mandatory to install it with the following command:

pip install pyspark

For example, a business wants to know the total number of orders counted by user. As Cassandra doesn't have an aggregation ability, Spark gives us the ability to do all of the required transformation along with sorting for a cleaner report. Setting ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Apache Cassandra - Second Edition

Mastering Apache Cassandra - Second Edition

Nishant Neeraj

Publisher Resources

ISBN: 9781789131499Supplemental Content