Running Spark jobs locally and in standalone

The examples are shown Chapter 13, My Name is Bayes, Naive Bayes, and can be made scalable for even larger dataset to solve different purposes. You can package all these three clustering algorithms with all the required dependencies and submit them as Spark job in the cluster. If you don't know how to make a package and create jar files out of the Scala class, you can bundle your application with all the dependencies using SBT or Maven.

According to Spark documentation at http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management, both the SBT and Maven have assembly plugins for packaging your Spark application as a fat jar. If your application is already bundled ...

Get Scala and Spark for Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.