O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Making the final subset

Based upon these frequencies, we will filter the data to only include a subset of the top categories. We will exclude some of the terms that do not apply to the physical product, such as design, set, and any associated colors:

# Testing OnlineRetail2 <- OnlineRetail OnlineRetail2 <- subset(OnlineRetail, lastword %in% c("BAG", "CASES", "HOLDER",      "BOX", "SIGN", "CHRISTMAS", "BOTTLE", "BUNTING", "MUG", "BOWL", "CANDLES",      "COVER", "HEART", "MUG", "BOWL")) 

Run the table() function again on the results to see the new frequencies:

head(as.data.frame(sort(table(OnlineRetail2$lastword[]), decreasing = TRUE)), 10) > sort(table(OnlineRetail2$lastword[]), decreasing = TRUE) > HOLDER 6792 > BOX 6528 > SIGN 6184 > BAG 5761 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required