Singular value decomposition and principal components analysis

It is quite common to have a dataset where the number of users and items number in the millions. Even if the rating matrix is not that large, it may be beneficial to reduce the dimensionality by creating a smaller (lower-rank) matrix that captures most of the information in the higher-dimension matrix. This may potentially allow you to capture important latent factors and their corresponding weights in the data. Such factors could lead to important insights, such as the movie genre or book topics in the rating matrix. Even if you are unable to discern meaningful factors, the techniques may filter out the noise in the data.

One issue with large datasets is that you will likely ...

Get Mastering Machine Learning with R - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.