- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for SparkContext to get access to the cluster:
import org.apache.spark.sql.SparkSessionimport org.apache.spark.mllib.evaluation.MultilabelMetricsimport org.apache.spark.rdd.RDD
- Create Spark's configuration and SparkContext:
val spark = SparkSession.builder.master("local[*]").appName("myMultilabel").config("spark.sql.warehouse.dir", ".").getOrCreate()
- We create the dataset for the evaluation model:
val data: RDD[(Array[Double], Array[Double])] = spark.sparkContext.parallelize(Seq((Array(0.0, ...