O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Merging the results back into the original data

We will want to retain the number of total items for each invoice on the original data frame. That will involve joining the number of items contained in each invoice back to the original transactions, using the merge() function, and specifying Invoicenum as the key.

If you count the number of distinct invoices before and after the merge, you can see that the invoice count is lower than prior to the merge:

#first take a 'before' snapshot 
 
nrow(OnlineRetail) 
> [1] 541909 
 
#count the number of distinct invoices 
 
sqldf("select count(distinct InvoiceNo) from OnlineRetail")  

The output shows a total of 25900 distinct invoices:

>   count(distinct InvoiceNo) 
> 1                     25900  

Now merge the counts back with the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required