Chapter 3R Data, Part 2: More Complicated Structures

3.1 Introduction

R data is made up of vectors, but, as you already know, there are more complicated structures that consist of a group of vectors put together. In this chapter, we talk about the three major structures in R that data handlers need to know about. The most important of these is the data frame, in which, eventually, almost all of our data will be held. But in order to build up to the data frame, we first need to describe matrices and lists. A data frame is part matrix, part list, and in order to use data frames most efficiently, you need to be able to think of it in both ways. Furthermore, we do encounter matrices in the data cleaning world, since the table() command can produce something that is basically a matrix.

3.2 Matrices

A matrix (plural matrices) is essentially a vector, arrayed in a (two-dimensional) rectangle. As with a vector, every element of a matrix needs to be of the same type – logical, numeric, or character. Most of the matrices we will see will be numeric, but it is also possible to have a logical matrix, typically for subscripting, as we shall see. We start using the vector of 15 numbers, 101, 102, c03-math-001, 115, to produce a c03-math-002 (i.e., five rows by three columns) numeric matrix.

> (a <- matrix (101:115, ...

Get A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.