Chapter 13. High-Density Plots

Working with Large Datasets

Sometimes a large dataset can be a challenge when applying techniques such as scatter plots. Let’s consider one such dataset from the car package. Vocab contains more than 21,000 observations containing some basic demographic data and scores on a vocabulary test. Load the package and look at the data (be careful to use the head() command; you do not want to print the entire dataset!):

> library(car)
> attach(Vocab)
> head(Vocab)

         year    sex education vocabulary
20040001 2004 Female         9          3
20040002 2004 Female        14          6
20040003 2004   Male        14          9
20040005 2004 Female        17          8
20040008 2004   Male        14          1
20040010 2004   Male        14          7

It might be interesting to examine the relationship between vocabulary and education. Does it seem reasonable to expect that those with low education will have low vocabulary scores and that the scores will increase as amount of education increases? A scatter plot should make this clear. Here’s how to create it:

# Figure 13-1
library(car)
attach(Vocab)
plot(education, vocabulary)
detach(Vocab)

The scatter plot in Figure 13-1 is anything but clear! There is not a simple line or band of points showing the relationship we thought we would see. There is a little whitespace at the upper left and the lower right, but every other place looks equally populated.

A scatter plot of education and vocabulary.
Figure 13-1. A scatter plot of education and vocabulary

The two ...

Get Graphing Data with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.