Chapter 13. High-Density Plots
Working with Large Datasets
Sometimes a large dataset can be a challenge when applying techniques such as scatter plots. Let’s consider one such dataset from the car
package. Vocab
contains more than 21,000 observations containing some basic demographic data and scores on a vocabulary test. Load the package and look at the data (be careful to use the head()
command; you do not want to print the entire dataset!):
> library(car) > attach(Vocab) > head(Vocab) year sex education vocabulary 20040001 2004 Female 9 3 20040002 2004 Female 14 6 20040003 2004 Male 14 9 20040005 2004 Female 17 8 20040008 2004 Male 14 1 20040010 2004 Male 14 7
It might be interesting to examine the relationship between vocabulary
and education
. Does it seem reasonable to expect that those with low education will have low vocabulary scores and that the scores will increase as amount of education increases? A scatter plot should make this clear. Here’s how to create it:
# Figure 13-1 library(car) attach(Vocab) plot(education, vocabulary) detach(Vocab)
The scatter plot in Figure 13-1 is anything but clear! There is not a simple line or band of points showing the relationship we thought we would see. There is a little whitespace at the upper left and the lower right, but every other place looks equally populated.
The two ...
Get Graphing Data with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.