How to do it...

  1. Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.

  2. Set up the package location where the program will reside:

package spark.ml.cookbook.chapter8.
  1. Import the necessary packages for vector and matrix manipulation:
 import org.apache.log4j.{Level, Logger} import org.apache.spark.mllib.clustering.GaussianMixture import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql.SparkSession
  1. Create Spark's session object:

val spark = SparkSession .builder.master("local[*]") .appName("myGaussianMixture") .config("spark.sql.warehouse.dir", ".") .getOrCreate()
  1. Let us take a look at the dataset and examine the input file. The Simulated SOCR Knee Pain Centroid ...

Get Apache Spark 2: Data Processing and Real-Time Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.