June 2017
Beginner to intermediate
576 pages
15h 22m
English
Now we can take a look at the distribution of the number of items. We can see by using the mean() function that there is an average of ~27 items. This will be a large enough assortment of items to do a meaningful analysis:
mean(x2$itemcount)
This is the following output:
> [1] 27
We can also plot a histogram:
hist(x2$itemcount, breaks = 500, xlim = c(0, 50))
The histogram shown next shows a definite spike at the low end. We know that the data cannon contains single invoices (count=1), since we have already filtered them out:

To verify this, we can inspect the itemcount frequencies via a table. Run the following ...