O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running PCA on the LFW dataset

Now that we have extracted our image pixel data into vectors, we can instantiate a new RowMatrix.

def computePrincipalComponents(k: Int): Matrix Computes the top k principal components. Rows correspond to observations, and columns correspond to variables. The principal components are stored as a local matrix of size n-by-k. Each column corresponds for one principal component, and the columns are in descending order of component variance. The row data do not need to be "centered" first; it is not necessary for the mean of each column to be 0. Note that this cannot be computed on matrices with more than 65535 columns. K is the number of top principal components. It returns a matrix of size n-by-k, whose columns ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required