Skip to Content
Data Analytics with Hadoop
book

Data Analytics with Hadoop

by Benjamin Bengfort, Jenny Kim
June 2016
Intermediate to advanced
286 pages
8h 9m
English
O'Reilly Media, Inc.
Content preview from Data Analytics with Hadoop

Chapter 9. Machine Learning

Machine learning computations aim to derive predictive models from current and historical data. The inherent premise is that a learned algorithm will improve with more training or experience, and in particular, machine learning algorithms can achieve extremely effective results for very narrow domains using models trained from large datasets.

As a result, computations of scale are implicated in most machine learning algorithms. For this reason, machine learning computations are well suited to a distributed computing paradigm, like Spark, in order to leverage large training sets to produce meaningful results. This chapter introduces the built-in Spark machine learning library, Spark MLlib, which consists of many common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as a new “ML-pipeline” framework, spark.ml, which provides a uniform set of high-level APIs that help users create and tune practical machine learning pipelines.1

Scalable Machine Learning with Spark

In Chapter 4, we introduced Spark as an in-memory distributed computing engine that can run on a Hadoop cluster. But additionally, the Spark platform ships with several built-in components that utilize Spark’s processing engine to enable other types of analytical workloads, which benefit from Spark’s computational optimizations. In this chapter, we’ll take a closer look at Spark’s built-in machine ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3

Sridhar Alla
Hadoop Fundamentals for Data Scientists

Hadoop Fundamentals for Data Scientists

Jenny Kim, Benjamin Bengfort
Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth

Publisher Resources

ISBN: 9781491913734Supplemental ContentErrata Page