June 2017
Beginner to intermediate
576 pages
15h 22m
English
We will want to retain the number of total items for each invoice on the original data frame. That will involve joining the number of items contained in each invoice back to the original transactions, using the merge() function, and specifying Invoicenum as the key.
If you count the number of distinct invoices before and after the merge, you can see that the invoice count is lower than prior to the merge:
#first take a 'before' snapshot
nrow(OnlineRetail)
> [1] 541909
#count the number of distinct invoices
sqldf("select count(distinct InvoiceNo) from OnlineRetail")
The output shows a total of 25900 distinct invoices:
> count(distinct InvoiceNo) > 1 25900
Now merge the counts back with the ...