book

HBase High Performance Cookbook

Name: HBase High Performance Cookbook
Author: Ruchir Choudhry
ISBN: 9781783983063

by Ruchir Choudhry

January 2017

Intermediate to advanced

350 pages

7h 8m

English

Packt Publishing

Read now

Unlock full access

HBase High Performance Cookbook
Table of Contents
HBase High Performance Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and moreWhy Subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book

Who this book is for
Sections
Getting readyHow to do it…How it works…There's more…See also
Conventions
Reader feedback
Customer support
Downloading the example codeErrataPiracyQuestions
1. Configuring HBase
Introduction
Configuring and deploying HBase
Getting readyHow to do it…How it works…There's more…See Also
Using the filesystem
Getting readyHow to do it…The HBase setupStarting the clusterValidating the clusterHow it works…There is more…See also
Administering clusters
Getting readyHow to do it…Log dumpMetrics dumpHow it works…See also
Managing clusters
Getting readygmondgmetadgwebHow to do it…Ganglia setupHow it works…There is more…See also
2. Loading Data from Various DBs
Introduction
Extracting data from Oracle
Getting readyHow to do it…How it works…There's more…See also…
Loading data using Oracle Big data connector
Getting ReadyHow to do it…How it works…There's more…See also…
Bulk utilities
Getting ready...How to do it…How it works…See also…
Using Hive with Apache HBase
Getting readyHow to do it…How it works…See also…
Using Sqoop
Getting readyHow to do it…How it works…There's more…Data compressionParallelismSee also…
3. Working with Large Distributed Systems Part I
Introduction
Scaling elastically or Auto Scaling with built-in fault tolerance
How to do it…How it works…There's more…See also
Auto Scaling HBase using AWS
Getting ReadyHow to do it…There's more…See also
Works on different VM/physical, cloud hardware
Getting readyHow to do it…There's more…See also
4. Working with Large Distributed Systems Part II
IntroductionSeek versus transferThe log-structured merge-treeDate ReadData DeleteStorage
Read path
How to do it…There's more…
Write Path
How to do it…How it works…There's more…Transactions (ACID) and multiversion concurrency control (MVCC)
Snappy
How to do it…How it works…There's more…
LZO compression
How to do it…How it works...There's more…
LZ4 compressor
How to do it…There's more…
Replication
How to do it…Deploying Master-Master or Cyclic ReplicationHow it works...There's more…Disabling Replication at the Peer Level
5. Working with Scalable Structure of tables
Introduction
HBase data model part 1
How to do it…How it works…There's more…
HBase data model part 2
How to do it…How it works…There's more…
How HBase truly scales on key and schema design
How to do it…See also
6. HBase Clients
Introduction
HBase REST and Java Client
How to do it…How it works…There's more…
Working with Apache Thrift
How to do it…How it works…There's more…
Working with Apache Avro
How to do it…How it works…There's more…
Working with Protocol buffer
How to do it…There's More…
Working with Pig and using Shell
How to do it…How it works…There's more…
7. Large-Scale MapReduce
IntroductionGetting Ready…How to do it…How it works…There's more…When not to use MapReduceSee also…
8. HBase Performance Tuning
Introduction
Working with infrastructure/operating systems
Getting ready…How to do it…
Working with Java virtual machines
Getting ready…How to do it…See also
Changing the configuration of components
Getting ready…How to do it…See also
Working with HDFS
How to do it…See also….
9. Performing Advanced Tasks on HBase
Machine learning using HbaseGetting readyHow to do it…RDBMSA plain Java program (static)There's more…
Real-time data analysis using Hbase and Mahout
How to do it…How it works...There's More…
Full text indexing using Hbase
Getting readyHow to do it…How it works…There's more…
10. Optimizing Hbase for Cloud
Introduction
Configuring Hbase for the Cloud
How to do it…How it works…
Connecting to an Hbase cluster using the command line
How to do it…How it works…
Backing up and restoring Hbase
How to do it…How it works…
Terminating an HBase cluster
How to do it…
Accessing HBase data with hive
How to do it …
Viewing the Hbase user interface
How to do it …
Monitoring HBase with CloudWatch
Monitoring Hbase with Ganglia
How it works…There is more …
11. Case Study
Introduction
Configuring Lily Platform
How to do it…There's more…
Integrating elastic search with Hbase
Configuring
How to do it…There's more…
Index

Content preview from HBase High Performance Cookbook

Chapter 7. Large-Scale MapReduce

In this chapter, we will consider how to write MapReduce jobs, how to design a large-scale MapReduce using HBase, how the internals of it work, and how to optimize the HBase framework to do it. In doing so, we will discuss the following:

MapReduce frameworks
When to use MapReduce and when not to
Case study with example code and explanations

Introduction

HBase provides various ways to leverage the potential of MapReduce based on the stack and the architecture you are going to use.

Before we start, let's do a quick revisit to the components, which will be used in MapReduce:

Record reader
Mapper
Combiner
Practitioner
Shuffle and sort
Reduce
Output format
Record reader: The core responsibility of a record reader is to analyze the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781783983063

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

HBase High Performance Cookbook

by Ruchir Choudhry

Chapter 7. Large-Scale MapReduce

Introduction

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.