It can be quite useful to compare the distributions of two sets of numbers; for example, two variables or two vectors. The sets of numbers might both be sets of measurements, or one might be a theoretical distribution. For example, we might want to see how a particular variable compared to the theoretical “normal” distribution.
In the United States and many other parts of the world, it is customary for customers to leave a tip for people who perform services. Just how much to give is a topic of frequent discussion among patrons of restaurants. The
reshape2 package includes a dataset,
tips, that was compiled by a waiter about tips his own customers gave to him. Let’s take a look inside this interesting dataset:
> library(reshape2) > attach(tips) > head(tips) total_bill tip sex smoker day time size 1 16.99 1.01 Female No Sun Dinner 2 2 10.34 1.66 Male No Sun Dinner 3 3 21.01 3.50 Male No Sun Dinner 3 4 23.68 3.31 Male No Sun Dinner 2 5 24.59 3.61 Female No Sun Dinner 4 6 25.29 4.71 Male No Sun Dinner 4
Now, we’ll try to learn more about the
tip variable. First, how are the tips distributed? We could plot the density of
tip to get an idea of that:
# Figure 15-1a library(reshape2) attach(tips) par(mfrow = c(3,2)) plot(density(tip), main = "a. Density(tip)", col = "blue", lwd = 2)
The plot in Figure 15-1a shows that the distribution is quite skewed; that is, it has a long tail to the right. In other words, a few patrons give relatively ...