O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Scrubbing and cleaning the data

Here comes the cleaning part!

Print some of the groceries contained within the description field of OnlineRetail:

kable(OnlineRetail$Description[1:5],col.names=c("Grocery Item Descriptions")) |Grocery Item Descriptions                 |  
|:-----------------------------------------| 
|WHITE HANGING HEART T-LIGHT HOLDER        | 
|METAL METAL LANTERN                       | 
|CREAM CUPID HEARTS COAT HANGER            | 
|KNITTED UNION FLAG HOT WATER BOTTLE       | 
|RED WOOLLY HOTTIE WHITE HEART.            | 

Although each line contains a separate grocery item, the items are in a uniform format, that is, the number of words describing each item can vary, and some words are adjectives and some are nouns. Additionally, the retailer may deem certain words to be irrelevant to a particular ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required