Field Guide to Hadoop

by Kevin Sitto, Marshall Presser

Released March 2015

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781491947883

Start your free trial

Book description

If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You’ll quickly understand how Hadoop’s projects, subprojects, and related technologies work together.

Each chapter introduces a different topic—such as core technologies or data transfer—and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you’ll have a good grasp of the playing field.

Topics include:

Core technologies—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark
Database and data management—Cassandra, HBase, MongoDB, and Hive
Serialization—Avro, JSON, and Parquet
Management and monitoring—Puppet, Chef, Zookeeper, and Oozie
Analytic helpers—Pig, Mahout, and MLLib
Data transfer—Scoop, Flume, distcp, and Storm
Security, access control, auditing—Sentry, Kerberos, and Knox
Cloud computing and virtualization—Serengeti, Docker, and Whirr

Publisher resources

View/Submit Errata

Product information

Title: Field Guide to Hadoop
Author(s): Kevin Sitto, Marshall Presser
Release date: March 2015
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781491947883

book

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

by Douglas Eadline

Get Started Fast with Apache Hadoop ® 2, YARN, and Today’s Hadoop Ecosystem With Hadoop 2.x …

book

Hadoop: Data Processing and Modelling

by Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across …

book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

by Deepak Vohra

Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache …

book

Hadoop Operations

by Eric Sammer

If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. …

Field Guide to Hadoop

Book description

Publisher resources

Table of contents

Product information

You might also like

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

Hadoop: Data Processing and Modelling

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

Hadoop Operations

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly