-
Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
-
Set up the package location where the program will reside:
package spark.ml.cookbook.chapter8.
- Import the necessary packages for vector and matrix manipulation:
import org.apache.log4j.{Level, Logger} import org.apache.spark.mllib.clustering.GaussianMixture import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql.SparkSession
-
Create Spark's session object:
val spark = SparkSession .builder.master("local[*]") .appName("myGaussianMixture") .config("spark.sql.warehouse.dir", ".") .getOrCreate()
-
Let us take a look at the dataset and examine the input file. The Simulated SOCR Knee Pain Centroid ...