Skip to Content
Mastering Spark with R
book

Mastering Spark with R

by Javier Luraschi, Kevin Kuo, Edgar Ruiz
October 2019
Beginner to intermediate
293 pages
6h 55m
English
O'Reilly Media, Inc.
Content preview from Mastering Spark with R

Chapter 3. Analysis

First lesson: stick them with the pointy end.

—Jon Snow

Previous chapters focused on introducing Spark with R, getting you up to speed and encouraging you to try basic data analysis workflows. However, they have not properly introduced what data analysis means, especially with Spark. They presented the tools you will need throughout this book—tools that will help you spend more time learning and less time troubleshooting.

This chapter introduces tools and concepts to perform data analysis in Spark from R. Spoiler alert: these are the same tools you use with plain R! This is not a mere coincidence; rather, we want data scientists to live in a world where technology is hidden from them, where you can use the R packages you know and love, and they “just work” in Spark! Now, we are not quite there yet, but we are also not that far. Therefore, in this chapter you learn widely used R packages and practices to perform data analysis—dplyr, ggplot2, formulas, rmarkdown, and so on—which also happen to work in Spark.

Chapter 4 will focus on creating statistical models to predict, estimate, and describe datasets, but first, let’s get started with analysis!

Overview

In a data analysis project, the main goal is to understand what the data is trying to “tell us,” hoping that it provides an answer to a specific question. Most data analysis projects follow a set of steps, as shown in Figure 3-1.

As the diagram illustrates, we first import data into our analysis stem, where ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Machine Learning with R

Advanced Machine Learning with R

Cory Lesmeister, Dr. Sunil Kumar Chinnamgari
Advanced R

Advanced R

Hadley Wickham
Regression Analysis with R

Regression Analysis with R

Giuseppe Ciaburro, Pierre Paquay, Manoj Kumar, Shaikh Salamatullah

Publisher Resources

ISBN: 9781492046363Errata Page