A very common operation is selecting certain rows from the dataframe on the basis of values in one or more of the variables (the columns of the dataframe). Suppose we want to restrict the data to cases from damp fields. We want all the columns, so the syntax for the subscripts is [‘which rows’, blank]:
worms[Damp == T,] Field.Name Area Slope Vegetation Soil.pH Damp Worm.density 4 Rush.Meadow 2.4 5 Meadow 4.9 TRUE 5 10 Rookery.Slope 1.5 4 Grassland 5.0 TRUE 7 15 Pond.Field 4.1 0 Meadow 5.0 TRUE 6 16 Water.Meadow 3.9 0 Meadow 4.9 TRUE 8 17 Cheapside 2.2 8 Scrub 4.7 TRUE 4 20 Farm.Wood 0.8 10 Scrub 5.1 TRUE 3
Note that because Damp is a logical variable (with just two potential values, TRUE or FALSE) we can refer to true or false in abbreviated form, T or F. Also notice that the T in this case is not enclosed in quotes: the T means true, not the character string ‘T’. The other important point is that the symbol for the logical condition is == (two successive equals signs with no gap between them; see p. 27).
The logic for the selection of rows can refer to values (and functions of values) in more than one column. Suppose that we wanted the data from the fields where worm density was higher than the median (>median(Worm.density)) and soil pH was less than 5.2. In R, the logical operator for AND is the & symbol:
worms[Worm.density > median(Worm.density) & Soil.pH < 5.2,] Field.Name Area Slope Vegetation Soil.pH Damp Worm.density ...