Chapter 3Data Preparation and Other Tricks

Package(s): gdata, chron

Dataset(s): 100mrun.xls, Earthwormbiomass.xls, Bacteria.XLS, nerve.dat, atombombtest.xls, airquality, wine.dat, sat, faithful, 2005-10.txt.gz

3.1 Introduction

Data comes in various forms and complexities. It is a difficult task to even list the major/minor complexity levels of data preparation. The different forms of data, as well as complexity levels, may be known or unknown. Thus, it is difficult to have a standard set of guidelines for teaching data preparation methods.

Complexities arise on various counts, such as file types, files with missing data values, files with different kinds of attributes, etc. In some cases, it may be simply improbable for the user to read the data properly without repeated efforts of writing the codes over and over again. In Section 3.2, we use the options available in the R function read.table to import data of external files which pose some difficulties. The options may vary to accommodate data problems, avoiding certain number of lines of file, and so forth. A good practice during the learning curve is to validate the imported data into R and check if it is on the expected lines. Thus, it may help to see the imported data using the functions head, tail, str, View, etc., and such functions will be illustrated in Section 3.4. The R functions aggregate, with, and assign are effective in carrying out data manipulation without the need to create new R objects. The use of these ...

Get A Course in Statistics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.