Efficient data processing with R

Book description

What you’ll learn—and how you can apply it

You’ll learn to perform efficient data carpentry—the process of taking rough, raw, and to some extent randomly arranged input data and creating neatly organized and tidy data. Working with clean data will be beneficial for every subsequent stage of your R project.

In this Lesson, readers will learn how to create user-friendly data frames with tibble, reshape data with tidyr operations such as gather and separate, process data efficiently with dplyr’s functions, and connect R to a range of database types.

This lesson is for you because

You are working on a project in R and have reached the data processing stage. You want to clean, manipulate, and tidy your dataset to get it ready for the next stage (typically modeling and visualization).

Prerequisites

  • Some knowledge of R

Materials or downloads needed in advance

  • Installed RStudio

This Lesson relies on a number of packages for data cleaning and processing. Check that they are installed on your computer and load them with:

  • library("tibble")
  • library("tidyr")
  • library("stringr")
  • library("readr")
  • library("dplyr")
  • library("data.table")



RSQLite and ggmap are also used in a couple of examples, though they are not central to the Lesson’s content.

Publisher resources

View/Submit Errata

Product information

  • Title: Efficient data processing with R
  • Author(s): Colin Gillespie, Robin Lovelace
  • Release date: December 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491980729