O'Reilly logo

Efficient data processing with R by Robin Lovelace, Colin Gillespie

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Efficient Data Processing with dplyr

After tidying your data, the next stage is typically data processing. This includes the creation of new data, such as a new column that is some function of existing columns, or data analysis, the process of asking directed questions of the data and exporting the results in a user-readable form.

We have carefully selected an appropriate package for these tasks: dplyr, which roughly means data frame pliers. dplyr has a number of advantages over base R and data.table approaches to data processing:

  • dplyr is fast to run (due to its C++ backend) and intuitive to type.

  • dplyr works well with tidy data, as described previously.

  • dplyr works well with databases, providing efficiency gains on large datasets.

Furthermore, dplyr is efficient to learn. It has a small number of intuitively named functions, or verbs. These were partly inspired by SQL, one of the longest established languages for data analysis, which combines multiple simple functions (such as SELECT and WHERE, roughly analogous to dplyr::select() and dplyr::filter()) to create powerful analysis workflows. Likewise, dplyr functions were designed to be used together to solve a wide range of data processing challenges (see Table 3-1).

Table 3-1. dplyr verb functions
dplyr function(s) Description Base R functions

filter(), slice()

Subset rows by attribute (filter) or position (slice)

subset(), [

arrange()

Return data ordered by variable(s)

order()

select()

Subset columns

subset() ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required