Writing your first Spark program

As mentioned before, you can use Spark with Python, Scala, Java, and R. We have different executable shell scripts available in the /spark/bin directory and so far, we have just looked at Spark shell, which can be used to explore data using Scala. The following executables are available in the spark/bin directory. We'll use most of these during the course of this book:

  • beeline
  • PySpark
  • run-example
  • spark-class
  • sparkR
  • spark-shell
  • spark-sql
  • spark-submit

Whatever shell you use, based on your past experience or aptitude, you have to deal with one abstract that is your handle to the data available on the spark cluster, be it local or spread over thousands of machines. The abstraction we are referring to here is called Resilient ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.