Skip to Content
Data Analytics with Hadoop
book

Data Analytics with Hadoop

by Benjamin Bengfort, Jenny Kim
June 2016
Intermediate to advanced
286 pages
8h 9m
English
O'Reilly Media, Inc.
Content preview from Data Analytics with Hadoop

Chapter 8. Analytics with Higher-Level APIs

In Chapter 6, we touched upon some of the motivations for working in a higher-level language such as Hive as opposed to native MapReduce, which can be difficult, unwieldy, and verbose even for relatively simple operations. Even experienced Java and MapReduce programmers find that most non-trivial Hadoop applications can entail a long development cycle, writing and chaining several mappers and reducers to form a complex job-chain or data processing workflow.

Furthermore, because MapReduce is designed to run in a batch-oriented fashion, it presents a number of limitations when performing data analysis that entails iterative processing (including many machine learning algorithms) or interactive data mining that requires responsive feedback. These criticisms of native MapReduce regarding development efficiency, maintenance, and runtime performance provide much of the motivation for both higher-level abstractions of Hadoop, and even a new processing engine that extends the MapReduce paradigm.

In this chapter, we introduce Pig, a programming abstraction of MapReduce that facilitates building MapReduce-based data flows. We also introduce some newer Spark APIs that extend the core RDD APIs by making it easier for developers to compute over structured data using familiar SQL-based concepts and syntax. These projects seek to boost developer productivity in programming MapReduce and Spark applications by providing expressive APIs that allow analysts ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3

Sridhar Alla
Hadoop Fundamentals for Data Scientists

Hadoop Fundamentals for Data Scientists

Jenny Kim, Benjamin Bengfort
Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth

Publisher Resources

ISBN: 9781491913734Supplemental ContentErrata Page