Chapter 9. Capstone: R for Data Analytics

In this chapter, we’ll apply what we’ve learned about data analysis and visualization in R to explore and test relationships in the familiar mpg dataset. You’ll learn a couple of new R techniques here, including how to conduct a t-test and linear regression. We’ll begin by calling up the necessary packages, reading in mpg.csv from the mpg subfolder of the book repository’s datasets folder, and selecting the columns of interest. We’ve not used tidymodels so far in this book, so you may need to install it.

library(tidyverse)
library(psych)
library(tidymodels)

# Read in the data, select only the columns we need
mpg <- read_csv('datasets/mpg/mpg.csv') %>%
  select(mpg, weight, horsepower, origin, cylinders)

#> -- Column specification -----------------------------------------------------
#> cols(
#>  mpg = col_double(),
#>  cylinders = col_double(),
#>  displacement = col_double(),
#>  horsepower = col_double(),
#>  weight = col_double(),
#>  acceleration = col_double(),
#>  model.year = col_double(),
#>  origin = col_character(),
#>  car.name = col_character()
#> )

head(mpg)
#> # A tibble: 6 x 5
#>     mpg weight horsepower origin cylinders
#>   <dbl>  <dbl>      <dbl> <chr>      <dbl>
#> 1    18   3504        130 USA            8
#> 2    15   3693        165 USA            8
#> 3    18   3436        150 USA            8
#> 4    16   3433        150 USA            8
#> 5    17   3449        140 USA            8
#> 6    15   4341        198 USA            8

Exploratory Data Analysis

Descriptive statistics are a good place to start when exploring data. We’ll do so with the describe() function from psych ...

Get Advancing into Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.