Now we can take a look at the distribution of the number of items. We can see by using the mean() function that there is an average of ~27 items. This will be a large enough assortment of items to do a meaningful analysis:
This is the following output:
>  27
We can also plot a histogram:
hist(x2$itemcount, breaks = 500, xlim = c(0, 50))
The histogram shown next shows a definite spike at the low end. We know that the data cannon contains single invoices (count=1), since we have already filtered them out:
To verify this, we can inspect the itemcount frequencies via a table. Run the following ...