Summary
In this chapter, we showed some examples of how to write your Spark code in Python and R. These are the most popular programming languages in the data scientist community.
We covered the motivation of using PySpark and SparkR for big data analytics with almost similar ease with Java and Scala. We discussed how to install these APIs on their popular IDEs such as PyCharm for PySpark and RStudio for SparkR. We also showed how to work with DataFrames and RDDs from these IDEs. Furthermore, we discussed how to execute Spark SQL queries from PySpark and SparkR. Then we also discussed how to perform some analytics with visualization of the dataset. Finally, we saw how to use UDFs with PySpark with examples.
Thus, we have discussed several ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access