Chapter 3. Filtering and Summarizing Data

After loading data from either flat files or databases (as we have seen in Chapter 1, Hello, Data!), or directly from the web via some APIs (as covered in Chapter 2, Getting Data from the Web), we often have to aggregate, transform, or filter the original dataset before the actual data analysis could take place.

In this chapter, we will focus on how to:

  • Filter rows and columns in data frames
  • Summarize and aggregate data
  • Improve the performance of such tasks with the dplyr and data.table packages besides the base R methods

Drop needless data

Although not loading the needless data is the optimal solution (see the Loading a subset of text files and Loading data from databases sections in Chapter 1, Hello, Data! ...

Get Mastering Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.