Book description
Go beyond the basics and master the next generation of Hadoop data processing platforms
In Detail
Hadoop is synonymous with Big Data processing. Its simple programming model, "code once and deploy at any scale" paradigm, and an ever-growing ecosystem makes Hadoop an all-encompassing platform for programmers with different levels of expertise.
This book explores the industry guidelines to optimize MapReduce jobs and higher-level abstractions such as Pig and Hive in Hadoop 2.0. Then, it dives deep into Hadoop 2.0 specific features such as YARN and HDFS Federation.
This book is a step-by-step guide that focuses on advanced Hadoop concepts and aims to take your Hadoop knowledge and skill set to the next level. The data processing flow dictates the order of the concepts in each chapter, and each chapter is illustrated with code fragments or schematic diagrams.
What You Will Learn
- Understand the changes involved in the process in the move from Hadoop 1.0 to Hadoop 2.0
- Customize and optimize MapReduce jobs in Hadoop 2.0
- Explore Hadoop I/O and different data formats
- Dive into YARN and Storm and use YARN to integrate Storm with Hadoop
- Deploy Hadoop on Amazon Elastic MapReduce
- Discover HDFS replacements and learn about HDFS Federation
- Get to grips with Hadoop's main security aspects
- Utilize Mahout and RHadoop for Hadoop analytics
Publisher resources
Table of contents
-
Mastering Hadoop
- Table of Contents
- Mastering Hadoop
- Credits
- About the Author
- Acknowledgments
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Hadoop 2.X
- 2. Advanced MapReduce
-
3. Advanced Pig
- Pig versus SQL
- Different modes of execution
- Complex data types in Pig
- Compiling Pig scripts
- Development and debugging aids
- The advanced Pig operators
- User-defined functions
- Pig performance optimizations
-
Best practices
- The explicit usage of types
- Early and frequent projection
- Early and frequent filtering
- The usage of the LIMIT operator
- The usage of the DISTINCT operator
- The reduction of operations
- The usage of Algebraic UDFs
- The usage of Accumulator UDFs
- Eliminating nulls in the data
- The usage of specialized joins
- Compressing intermediate results
- Combining smaller files
- Summary
- 4. Advanced Hive
- 5. Serialization and Hadoop I/O
- 6. YARN – Bringing Other Paradigms to Hadoop
- 7. Storm on YARN – Low Latency Processing in Hadoop
- 8. Hadoop on the Cloud
- 9. HDFS Replacements
- 10. HDFS Federation
- 11. Hadoop Security
- 12. Analytics Using Hadoop
- A. Hadoop for Microsoft Windows
- Index
Product information
- Title: Mastering Hadoop
- Author(s):
- Release date: December 2014
- Publisher(s): Packt Publishing
- ISBN: 9781783983643
You might also like
video
React - The Complete Guide (Includes Hooks, React Router, and Redux) - Second Edition
**This course is now updated for the latest version of React—React 18** React.js is the most …
video
Node.js - The Complete Guide
Node.js is one of the most popular and modern server-side programming languages and is used widely …
audiobook
How to Do Nothing
A galvanizing critique of the forces vying for our attention-and our personal information-that redefines what we …
video
Microsoft Power BI - The Complete Masterclass [2023 EDITION]
Microsoft Power BI is an interactive data visualization software primarily focusing on business intelligence, part of …