Submitting Spark job for cluster analysis
The examples shown in this chapter can be made scalable for the larger dataset to serve different purposes. You can package all three clustering algorithms with all the required and submit them as a Spark job in the cluster. Now use the following lines of code to submit your Spark job of K-means clustering, for example (use similar syntax for other classes), for the Saratoga NY Homes dataset:
# Run application as standalone mode on 8 cores SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.KMeansDemo \ --master local[8] \ KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar \ Saratoga_NY_Homes.txt # Run on a YARN cluster export HADOOP_CONF_DIR=XXX SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.KMeansDemo \ --master yarn \ --deploy-mode cluster \ # can be client for client mode --executor-memory 20G \ --num-executors 50 \ KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar \ Saratoga_NY_Homes...