The standard kmeans function does not have a prediction method. However, we can use the flexclust package which does. Since the prediction method can take a long time to run, we will illustrate it only on a sample number of rows and columns. In order to compare the test and training results, they also need to have the same number of columns. For illustration purposes, we will set the number at 10.
To begin, take a sample from the OnlineRetail training data:
set.seed(1) sample.size <- 10000 max.cols <- 10library("flexclust") OnlineRetail <- OnlineRetail[1:sample.size, ]
Next, create the document term matrix from the description column in the sampled dataset. We will use the create_matrix function ...