How to do it...

  1. Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.

  2. Set up the package location where the program will reside:

  1. Import the necessary packages for vector and matrix manipulation:
 import org.apache.log4j.{Level, Logger} import org.apache.spark.mllib.clustering.GaussianMixture import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql.SparkSession
  1. Create Spark's session object:

val spark = SparkSession .builder.master("local[*]") .appName("myGaussianMixture") .config("spark.sql.warehouse.dir", ".") .getOrCreate()
  1. Let us take a look at the dataset and examine the input file. The Simulated SOCR Knee Pain Centroid ...

Get Apache Spark 2: Data Processing and Real-Time Analytics now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.