In some specific scenarios, the dataset can be structured like a matrix, where the rows represent a category and the columns represent another category. For example, let's suppose we have a set of feature vectors representing the preference (or rating) that a user expressed for a group of items. In this example, we can randomly create such a matrix, forcing 50% of ratings to be null (this is realistic considering that a user never rates all possible items):
import numpy as npnb_users = 100nb_products = 150max_rating = 10up_matrix = np.random.randint(0, max_rating + 1, size=(nb_users, nb_products))mask_matrix = np.random.randint(0, 2, size=(nb_users, nb_products))up_matrix *= mask_matrix
In this case, we are assuming that 0 means ...