July 2017
Intermediate to advanced
796 pages
18h 55m
English
The examples shown in this chapter can be made scalable for the even larger dataset to serve different purposes. You can package all three clustering algorithms with all the required dependencies and submit them as a Spark job in the cluster. Now use the following lines of code to submit your Spark job of K-means clustering, for example (use similar syntax for other classes), for the Saratoga NY Homes dataset:
# Run application as standalone mode on 8 cores SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.KMeansDemo \ --master local[8] \ KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar \ Saratoga_NY_Homes.txt# Run on a YARN cluster export HADOOP_CONF_DIR=XXX SPARK_HOME/bin/spark-submit ...
Read now
Unlock full access