Comparing Sets of Numbers

It can be quite useful to compare the distributions of two sets of numbers; for example, two variables or two vectors. The sets of numbers might both be sets of measurements, or one might be a theoretical distribution. For example, we might want to see how a particular variable compared to the theoretical “normal” distribution.

In the United States and many other parts of the world, it is customary for customers to leave a tip for people who perform services. Just how much to give is a topic of frequent discussion among patrons of restaurants. The `reshape2` package includes a dataset, `tips`, that was compiled by a waiter about tips his own customers gave to him. Let’s take a look inside this interesting dataset:

```> library(reshape2)
> attach(tips)
total_bill  tip    sex smoker day   time size
1      16.99 1.01 Female     No Sun Dinner    2
2      10.34 1.66   Male     No Sun Dinner    3
3      21.01 3.50   Male     No Sun Dinner    3
4      23.68 3.31   Male     No Sun Dinner    2
5      24.59 3.61 Female     No Sun Dinner    4
6      25.29 4.71   Male     No Sun Dinner    4```

Now, we’ll try to learn more about the `tip` variable. First, how are the tips distributed? We could plot the density of `tip` to get an idea of that:

```# Figure 15-1a
library(reshape2)
attach(tips)
par(mfrow = c(3,2))
plot(density(tip),
main = "a. Density(tip)",
col = "blue",
lwd = 2)```

The plot in Figure 15-1a shows that the distribution is quite skewed; that is, it has a long tail to the right. In other words, a few patrons give relatively ...

Get Graphing Data with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.