Data manipulation with dplyr

Over the past couple of years I have been using dplyr more and more to manipulate and summarize data. It is faster than using the base functions, allows you to chain functions, and once you are familiar with it has a more user-friendly syntax. In my experience, just a few functions can accomplish the majority of your data manipulation needs. Install the package as described above, then load it into the R environment.

    > library(dplyr)

Let's explore the iris dataset available in base R. Two of the most useful functions are summarize() and group_by(). In the code that follows, we see how to produce a table of the mean of Sepal.Length grouped by the Species. The variable we put the mean in will be called average ...

Get Mastering Machine Learning with R - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.