O'Reilly logo

Learning Predictive Analytics with R by Eric Mayor

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

In this section, we will start by preprocessing the corpus for analysis and then inspecting it. We will then build the training and testing data frames.

Preprocessing and inspecting the corpus

We can see that the joint corpus contains 2,000 documents as we requested. We can now perform the steps we discussed in the preceding section. We will build a function that performs them all at once for this purpose (we will use this function again later in the chapter):

1 install.packages("SnowballC") 2 preprocess = function(corpus, stopwrds = 3 stopwords("english")){ 4 library(SnowballC) 5 corpus = tm_map(corpus, content_transformer(tolower)) 6 corpus = tm_map(corpus, removePunctuation) 7 corpus = tm_map(corpus, 8 content_transformer(removeNumbers)) ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required