Sometimes, the apparent relationship between two variables can be quite misleading. This may well be due to a strong association that one or both variables have to a third variable. Consider the
States dataset from the
car package. This is data about the SAT exam, a test that many students in the United States take as part of the college admissions process.
States also contains several other variables about secondary education on the state level in 1992. Each of the 51 observations in this dataset represents one state, or the District of Columbia. Figure 18-1 shows a scatter plot of average scores on the SATM, the math subtest of the SAT, against the amount of money (in thousands of dollars per student) spent on public education in each state.
Here is the code to produce Figure 18-1:
# Figure 18-1 library(car) attach(States) plot(dollars,SATM, pch = 16, col = "maroon") grid(lty = "solid")
Figure 18-1 seems to indicate that states that spent relatively little on education had high SATM scores, whereas higher-spending states had relatively low scores. This is completely counterintuitive! We would expect—or ...