Chapter 6. Data Transformations
While traditional programming languages use loops, R has traditionally
encouraged using vectorized operations and the apply family of
functions to crunch data in batches, greatly streamlining the
calculations. There is nothing to prevent you from writing loops in R
that break your data into whatever chunks you want and then doing an
operation on each chunk. However, using vectorized functions can, in
many cases, increase the speed, readability, and maintainability of your
code.
In recent history, though, the tidyverse—specifically the purrr and
dplyr packages—has introduced new idioms into R that make these
concepts easier to learn and slightly more consistent. The name purrr
comes from a play on the phrase “Pure R.” A “pure function” is a
function whose result is determined only by its
inputs, and which does not produce any side effects. This is not a
functional programming concept you need to understand in order to get
great value from purrr, however. All most users need to know is that
purrr contains functions to help us operate “chunk by chunk” on our
data in a way that meshes well with other tidyverse packages such as
dplyr.
Base R has many apply functions—apply, lapply, sapply, tapply,
and mapply—as well as their cousins, by and split. These are solid functions that have been workhorses in Base R for years. We struggled a bit with how much to focus on the Base R apply functions and how much to focus on the newer “tidy” approach. After much ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access