July 2018
Beginner to intermediate
406 pages
9h 55m
English
As we have seen, it is best to normalize the data to remove obvious movie- or user-specific effects. We will just use one very simple type of normalization that we used before: conversion to z-scores.
Unfortunately, we cannot simply use scikit-learn's normalization objects as we have to deal with the missing values in our data (that is, not all movies were rated by all users). Thus, we want to normalize by the mean and standard deviation of the values that are, in fact, present.
We will write our own class that will ignore missing values. This class will follow the scikit-learn preprocessing API. We can even derive from scikit-learn's TransformerMixin class to add a fit_transform method:
from sklearn.base import ...
Read now
Unlock full access