We are close. In this section, the goal is to create an input feature vector, and the steps are listed as follows:
- Import the Vectors class.
- Inside the map operation on the Array, we will iterate over each row of our header-free dataset. Then, we transform each row in turn, operating on every single column containing predetermined cell nuclei measurements. These columns are converted to doubles by using the dense method.
- The map operation processes the entire dataset and produces featureVectorArray, a structure of type Array[(Input Feature Vector, String representing the Class)]:
//Step 1scala> import org.apache.spark.ml.linalg.Vectorsimport org.apache.spark.ml.linalg.Vectors//Step 2scala> val featureVectorArray ...