Since we will want to have a basket of items to perform some association rules on, we will want to filter out the transactions that only have one item per invoice. That might be useful for a separate analysis of customers who only purchased one item, but it does not help with finding associations between multiple items, which is the goal of this exercise.
- Let's use sqldf to find all of the single item transactions, and then we will create a separate dataframe consisting of the number of items per customer invoice:
- First construct a query: How many distinct invoices were there? We see that there were 25900 separate invoices:
sqldf("select count(distinct InvoiceNo) from OnlineRetail") ...