June 2017
Beginner to intermediate
576 pages
15h 22m
English
Now let's switch over to the test data set. First, take an identical sample size as was taken for the training data, and repeat the procedure starting with creating the term document matrix on the sample:
OnlineRetail.test <- OnlineRetail.test[1:sample.size, ] dtMatrix.test <- create_matrix(OnlineRetail.test$Description, minDocFreq = 1, removeNumbers = TRUE, minWordLength = 4, removeStopwords = TRUE, removePunctuation = TRUE, stemWords = FALSE, weighting = weightTf)
As we did before, remove sparse terms from the matrix. Then, use the dim() function to see how many non-sparse terms remain:
dtMatrix.test <- removeSparseTerms(dtMatrix.test, 0.99) dim(dtMatrix.test) # reduced to 61 terms > [1] 10000 61
Take the first ...