How to do it...

  1. Start a new project in IntelliJ or in an editor of your choice and make sure all the necessary JAR files (Scala and Spark) are available to your application.
  2. Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.mllib.linalg.distributed.RowMatrix import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix} import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry} import org.apache.spark.sql.{SparkSession} import org.apache.spark.mllib.linalg._ import breeze.linalg.{DenseVector => BreezeVector} import Array._ import org.apache.spark.mllib.linalg.DenseMatrix import org.apache.spark.mllib.linalg.SparseVector
  1. Set up the Spark context and application ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.