© Thomas Mailund 2017

Thomas Mailund, Beginning Data Science in R, 10.1007/978-1-4842-2671-1_3

3. Data Manipulation

Thomas Mailund

(1)Aarhus, Denmark

Data science is as much about manipulating data as it is about fitting models to data. Data rarely arrives in a form that we can directly feed into the statistical models or machine learning algorithms we want to analyze them with. The first stages of data analysis are almost always figuring out how to load the data into R and then figuring out how to transform it into a shape you can readily analyze. The code in this chapter, and all the following, assumes that the packages magrittr and ggplot2 have been loaded (just to avoid explicitly doing so in each example).

Data Already in R

There are some datasets ...

Get Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.