Learning how to handle your data, how to enter it into the computer, and how to read the data into R are amongst the most important topics you will need to master. R handles data in objects known as dataframes. A dataframe is an object with rows and columns (a bit like a two-dimensional matrix). The rows contain different observations from your study, or measurements from your experiment. The columns contain the values of different variables. The values in the body of the dataframe can be numbers (as they would be in as matrix), but they could also be text (e.g. the names of factor levels for categorical variables, like ‘male’ or ‘female’ in a variable called ‘gender’), they could be calendar dates (like 23/5/04), or they could be logical variables (like ‘TRUE’ or ‘FALSE’). Here is a spreadsheet in the form of a dataframe with seven variables, the leftmost of which comprises the row names, and other variables are numeric (Area, Slope, Soil pH and Worm density), categorical (Field Name and Vegetation) or logical (Damp is either true = T or false = F).

Field Name Area Slope Vegetation Soil pH Damp Worm density
Nash's Field 3.6 11 Grassland 4.1 F 4
Silwood Bottom 5.1 2 Arable 5.2 F 7
Nursery Field 2.8 3 Grassland 4.3 F 2
Rush Meadow 2.4 5 Meadow 4.9 T 5
Gunness' Thicket 3.8 0 Scrub 4.2 F 6
Oak Mead 3.1 2 Grassland 3.9 F 2
Church Field 3.5 3 Grassland 4.2 F 3
Ashurst 2.1 0 Arable 4.8 F 4
The Orchard 1.9 0 Orchard 5.7 F 9
Rookery Slope 1.5 4 Grassland

Get Statistics: An Introduction Using R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.