book

Learning Hbase

Name: Learning Hbase
Author: Shashwat Shriparv
ISBN: 9781783985944

by Shashwat Shriparv

November 2014

Beginner to intermediate

326 pages

7h 4m

English

Packt Publishing

Read now

Unlock full access

Learning HBase
Table of Contents
Learning HBase
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and moreWhy subscribe?Free access for Packt account holders
Preface
What this book covers
What you need for this book

Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example codeErrataPiracyQuestions
1. Understanding the HBase Ecosystem
HBase layout on top of Hadoop
Comparing architectural differences between RDBMs and HBase
HBase features
HBase in the Hadoop ecosystem
Data representation in HBaseHadoopCore daemons of HadoopComparing HBase with Hadoop
Comparing functional differences between RDBMs and HBase
Logical view of row-oriented databasesLogical view of column-oriented databasesPros and cons of column-oriented databases
About the internal storage architecture of HBase
Getting started with HBase
When it startedHBase components and functionalitiesZooKeeperWhy an odd number of ZooKeepers?HMasterIf a master node goes downRegionServerComponents of a RegionServerClientCatalog tablesWho is using HBase and why?When should we think of using HBase?When not to use HBaseUnderstanding some open source HBase toolsThe Hadoop-HBase version compatibility table
Applications of HBase
HBase pros and cons
Summary
2. Let's Begin with HBase
Understanding HBase components in detailHFileRegionScalability – understanding the scale up and scale out processesScale inScale out
Reading and writing cycle
Write-Ahead LogsMemStore
HBase housekeeping
CompactionMinor compactionMajor compactionRegion splitRegion assignmentRegion mergeRegionServer failovers
The HBase delete request
The reading and writing cycle
List of available HBase distributions
Prerequisites and capacity planning for HBase
The forward DNS resolutionThe reverse DNS resolutionJavaSSHDomain Name ServerUsing Network Time Protocol to keep your node on timeOS-level changes and tuning up OS for HBase
Summary
3. Let's Start Building It
Downloading Java on Ubuntu
Considering host configurations
Host file basedCommand basedFile basedDNS based
Installing and configuring SSH
Installing SSH on Ubuntu/Red Hat/CentOSConfiguring SSH
Installing and configuring NTP
Performing capacity planning
Installing and configuring Hadoop
core-site.xmlhdfs-site.xmlyarn-site.xmlmapred-site.xmlhadoop-env.shyarn-env.shSlaves file
Hadoop start up steps
Configuring Apache HBase
Configuring HBase in the standalone modeConfiguring HBase in the distributed modehbase-site.xmlHBase-env.shregionservers
Installing and configuring ZooKeeper
Installing Cloudera Hadoop and HBase
Downloading the required RPM packagesInstalling Cloudera in an easier way
Installing the Hadoop and MapReduce packages
Installing Hadoop on Windows
Summary
4. Optimizing the HBase/Hadoop Cluster
Setup types for Hadoop and HBase clusters
Recommendations for CDH cluster configuration
Capacity planning
Hadoop optimization
General optimization tipsOptimizing Java GCOptimizing Linux OSOptimizing the Hadoop parameterOptimizing MapReduceRack awareness in HadoopNumber of Map and Reduce limits in configuration filesConsidering and deciding the maximum number of Map and Reduce tasks
Optimizing HBase
HadoopMemoryJavaOSHBase
Optimizing ZooKeeper
Important files in Hadoop
Important files in HBase
Summary
5. The Storage, Structure Layout, and Data Model of HBase
Data types in HBase
Storing data in HBase – logical view versus actual physical view
NamespaceCommands available for namespaces
Services of HBase
Row keyColumn familyColumnCellVersionTimestamp
Data model operations
GetPutScanDelete
Versioning and why
Deciding the number of the version
Lower bound of versionsUpper bound of versions
Schema designing
Types of table designsBenefits of Short Wide and Tall-Thin design patternsComposite key designingReal-time use case of schema in an HBase tableSchema change operations
Calculating the data size stored in HBase
Summary
6. HBase Cluster Maintenance and Troubleshooting
Hadoop shell commandsTypes of Hadoop shell commandsAdministration commandsUser commandsFile system-related commandsDifference between copyToLocal/copyFromLocal and get/put
HBase shell commands
HBase administration tools
hbck – HBase checkHBase health check script
Writing HBase shell scripts
Using the Hadoop tool or JARs for HBase
Connecting HBase with Hive
HBase region management
CompactionMerge
HBase node management
CommissioningDecommissioning
Implementing security
Secure accessRequirementKerberos KDCClient-side security configurationClient-side security configuration for thrift requestsServer-side security configurationSimple securityServer-side configurationClient-side configurationThe tag security featureAccess control in HBaseServer-side access controlCell-level access using tagsConfiguring ZooKeeper for security
Troubleshooting the most frequent HBase errors and their explanations
What might fail in clusterMonitoring HBase healthHBase web UIMasterRegionServerZooKeeper command lineLinux tools
Summary
7. Scripting in HBase
HBase backup and restore techniquesOffline backup / full-shutdown backupBackupRestoreOnline backupThe HBase snapshotOnlineOfflineThe HBase replication methodSetting up cluster replicationBackup and restore using Export and Import commandsExportImportMiscellaneous utilitiesCopyTableHTable APIBackup using a Mozilla tool
HBase on Windows
Scripting in HBase
The .irbrc fileGetting the HBase timestamp from HBase shellEnabling debugging shellEnabling the debug level in HBase shellEnabling SQL in HBase
Contributing to HBase
Summary
8. Coding HBase in Java
Setting up the environment for developmentBuilding a Java client to code in HBase
Data types
Data model Java operations
ReadGet()ConstructorsSupported methodsScan()ConstructorsMethodsWritePut()ConstructorsMethodsModifyDelete()ConstructorsMethods
HBase filters
Types of filters
Client APIs
Summary
9. Advance Coding in Java for HBase
Interfaces, classes, and exceptions
Code related to administrative tasks
Data operation code
MapReduce and HBase
RESTful services and Thrift services interface
REST service interfacesThrift
Coding for HDFS operations
Some advance topics in brief
CoprocessorsTypes of coprocessorsBloom filtersThe Lily projectFeatures
Summary
10. HBase Use Cases
HBase in industry today
The future of HBase against relational databases
Some real-world project examples' use cases
HBase at FacebookChoosing HBaseStoring in HBaseThe architecture of a Facebook messageFacts and figuresHBase at PinterestThe layout architectureHBase at GrouponThe layout architectureHBase at LongTail VideoThe layout architectureHBase at Aadhaar (UIDAI)The layout architecture
Useful links and references
Summary
Index

Overview

In "Learning HBase", you'll dive deep into the core functionalities of Apache HBase and understand its applications in handling Big Data environments. By exploring both theoretical concepts and practical scenarios, you'll acquire the skills to set up, manage, and optimize HBase clusters.

What this Book will help me do

Understand and explain the components of the HBase ecosystem.
Install and configure HBase clusters for optimized performance.
Develop and maintain applications using HBase's structured storage model.
Troubleshoot and resolve common issues in HBase deployments.
Leverage Hadoop tools and advanced techniques to enhance HBase capabilities.

Author(s)

None Shriparv is a skilled technologist with a robust background in Big Data tools and application development. With hands-on expertise in distributed storage systems and data analytics, they lend exceptional insights into managing HBase environments. Their approach combines clarity, practicality, and a focus on real-world applicability.

Who is it for?

This book is ideal for system administrators and developers who are starting their journey in Big Data technology. With clear explanations and hands-on scenarios, it suits those seeking foundational and intermediate knowledge of the HBase ecosystem. Suitably designed, it helps students, early-career professionals, and mid-level technologists enhance their expertise. If you work in Big Data and want to grow your skill set in distributed storage systems, this book is for you.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781783985944

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills