Chapter 6


The alternative to using graphics is to summarize your data in tabular form. Broadly speaking, if you want to convey detail use a table, and if you want to show effects then use graphics. You are more likely to want to use a table to summarize data when your explanatory variables are categorical (such as people's names, or different commodities) than when they are continuous (in which case a scatterplot is likely to be more informative; see p. 189).

There are two very important functions that you need to distinguish:

  • table for counting things;
  • tapply for averaging things, and applying other functions across factor levels.

6.1 Tables of counts

The table function is perhaps the most useful of all the simple vector functions, because it does so much work behind the scenes. We have a vector of objects (they could be numbers or character strings) and we want to know how many of each is present in the vector. Here are 1000 integers from a Poisson distribution with mean 0.6:


We want to count up all of the zeros, ones, twos, and so on. A big task, but here is the table function in action:

 0	1	2	3	4	5
539	325	110	24	1	1 

There were 539 zeros, 325 ones, 110 twos, 24 threes, 1 four, 1 five and nothing larger than 5. That is a lot of work (imagine tallying them for yourself). The function works for characters as well as for numbers, and for multiple classifying variables:

infections<-read.table("c:\\temp\\disease.txt",header=T) ...

Get The R Book, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.