O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Replacing missing values

We see from the frequencies of lastword that there are some blank values. The lastword seems to contain mostly nouns, and firstword seems to be a mix of adjectives and nouns as well (bag, heart). If we treat the text strings as a Bag of Words, we can rationalize combining the two into one token. However, we will give lastword priority, and populate it with the value of firstword only if it is missing:

# replace blank values in lastword, with first word.OnlineRetail$lastword <- ifelse(OnlineRetail$lastword == "", OnlineRetail$firstword, OnlineRetail$lastword)

After we are done with this, we will take another look at the frequencies and observe that the blank values have disappeared:

 head(as.data.frame(sort(table(OnlineRetail$lastword[]), ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required