Skip to Content
Mastering Spark with R
book

Mastering Spark with R

by Javier Luraschi, Kevin Kuo, Edgar Ruiz
October 2019
Beginner to intermediate
293 pages
6h 55m
English
O'Reilly Media, Inc.
Content preview from Mastering Spark with R

Chapter 5. Pipelines

You will never walk again, but you will fly!

—Three-Eyed Raven

In Chapter 4, you learned how to build predictive models using the high-level functions Spark provides and well-known R packages that work well together with Spark. You learned about supervised methods first and finished the chapter with an unsupervised method over raw text.

In this chapter, we dive into Spark Pipelines, which is the engine that powers the features we demonstrated in Chapter 4. So, for instance, when you invoke an MLlib function via the formula interface in R—for example, ml_logistic_regression(cars, am ~ .)—a pipeline is constructed for you under the hood. Therefore, Pipelines also allow you to make use of advanced data processing and modeling workflows. In addition, a pipeline also facilitates collaboration across data science and engineering teams by allowing you to deploy pipelines into production systems, web applications, mobile applications, and so on.

This chapter also happens to be the last chapter that encourages using your local computer as a Spark cluster. You are just one chapter away from getting properly introduced to cluster computing and beginning to perform data science or machine learning that can scale to the most demanding computation problems.

Overview

The building blocks of pipelines are objects called transformers and estimators, which are collectively referred to as pipeline stages. A transformer can be used to apply transformations to a DataFrame and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Machine Learning with R

Advanced Machine Learning with R

Cory Lesmeister, Dr. Sunil Kumar Chinnamgari
Advanced R

Advanced R

Hadley Wickham
Regression Analysis with R

Regression Analysis with R

Giuseppe Ciaburro, Pierre Paquay, Manoj Kumar, Shaikh Salamatullah

Publisher Resources

ISBN: 9781492046363Errata Page