We will want to retain the number of total items for each invoice on the original data frame. That will involve joining the number of items contained in each invoice back to the original transactions, using the merge() function, and specifying Invoicenum as the key.
If you count the number of distinct invoices before and after the merge, you can see that the invoice count is lower than prior to the merge:
#first take a 'before' snapshot nrow(OnlineRetail) >  541909 #count the number of distinct invoices sqldf("select count(distinct InvoiceNo) from OnlineRetail")
The output shows a total of 25900 distinct invoices:
> count(distinct InvoiceNo) > 1 25900
Now merge the counts back with the ...