O'Reilly logo

R Data Analysis Cookbook - Second Edition by Kuntal Ganguly

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Split-apply-combine with dplyr

plyr can be slow with very large datasets that involve a lot of subsetting. dplyr is a package for data manipulation and it provides easy-to-use functions that are very handy when performing data analysis and manipulation with large datasets. The dplyr functions are similar to SQL, shown as follows:

dplyr Function Description Equivalent SQL Function
select() Select columns (variables) SELECT
filter() Filter (subset) rows WHERE
group_by() Group the data GROUP BY
summarise() Summarise (or aggregate) data     -
arrange() Sort the data ORDER BY
join() Joining data frames (tables) JOIN
mutate() Creating New Variables COLUMN ALIAS

 

To install the dplyr package, type the following command:

install.packages("dplyr") ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required