O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Projecting data using PCA on the LFW dataset

We will illustrate this concept by projecting each LFW image into a ten-dimensional vector. This is done through a matrix multiplication of the image matrix with the matrix of principal components. As the image matrix is a distributed MLlib RowMatrix, Spark takes care of distributing this computation for us through the multiply function.

val projected = matrix.multiply(pc) println(projected.numRows, projected.numCols)

This preceding function will give you the following output:

(1055,10)

Observe that each image that had a dimension of 2500, has been transformed into a vector of size 10. Let's take a look at the first few vectors:

println(projected.rows.take(5).mkString("n"))

Here is the output: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required