June 2007
Beginner to intermediate
950 pages
27h 8m
English
There are three useful functions here
| • summary | summarize all the contents of all the variables |
| • aggregate | create a table after the fashion of tapply |
| • by | perform functions for each level of specified factors |
Use of summary and by with the worms database on p. 110.
The other useful function for summarizing a dataframe is aggregate. It is used like tapply (see p. 18) to apply a function (mean in this case) to the levels of a specified categorical variable (Vegetation in this case) for a specified range of variables (Area, Slope, Soil.pH and Worm.density are defined using their subscripts as a column index in worms[,c(2,3,5,7)]):
aggregate(worms[,c(2,3,5,7)],by=list(veg=Vegetation),mean)
veg Area Slope Soil.pH Worm.density
1 Arable 3.866667 1.333333 4.833333 5.333333
2 Grassland 2.911111 3.666667 4.100000 2.444444
3 Meadow 3.466667 1.666667 4.933333 6.333333
4 rchard 1.900000 0.000000 5.700000 9.000000
5 Scrub 2.425000 7.000000 4.800000 5.250000
The by argument needs to be a list even if, as here, we have only one classifying factor. Here are the aggregated summaries for Vegetation and Damp:
aggregate(worms[,c(2,3,5,7)],by=list(veg=Vegetation,d=Damp),mean) veg d Area Slope Soil.pH Worm.density 1 Arable FALSE 3.866667 1.333333 4.833333 5.333333 2 Grassland FALSE 3.087500 3.625000 3.987500 1.875000 3 Orchard FALSE 1.900000 0.000000 5.700000 9.000000 4 Scrub FALSE 3.350000 5.000000 4.700000 7.000000 5 Grassland TRUE 1.500000 4.000000 ...Read now
Unlock full access