book

Architecting HBase Applications

by Jean-Marc Spaggiari, Kevin O'Dell

July 2016

Beginner to intermediate

252 pages

6h 16m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Who Should Read This Book?How This Book Is OrganizedAdditional ResourcesConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgmentsFrom KevinFrom Jean-Marc
I. Introduction to HBase
1. What Is HBase?
Column-Oriented Versus Row-OrientedImplementation and Use Cases
2. HBase Principles
Table FormatTable LayoutTable StorageInternal Table OperationsCompactionSplits (Auto-Sharding)BalancingDependenciesHBase RolesMaster ServerRegionServerThrift ServerREST Server
3. HBase Ecosystem
Monitoring ToolsCloudera ManagerApache AmbariHannibalSQLApache PhoenixApache TrafodionSplice MachineHonorable Mentions (Kylin, Themis, Tephra, Hive, and Impala)FrameworksOpenTSDBKiteHappyBaseAsyncHBase
4. HBase Sizing and Tuning Overview
HardwareStorageNetworkingOS TuningHadoop TuningHBase TuningDifferent Workload Tuning
5. Environment Setup
System RequirementsOperating SystemVirtual MachineResourcesJavaHBase Standalone InstallationHBase in a VMLocal Versus VMLocal ModeVirtual Linux EnvironmentQuickStart VM (or Equivalent)TroubleshootingIP/Name ConfigurationAccess to the /tmp FolderEnvironment VariablesAvailable MemoryFirst StepsBasic OperationsImport Code ExamplesTesting the ExamplesPseudodistributed and Fully Distributed
II. Use Cases
6. Use Case: HBase as a System of Record
Ingest/Pre-ProcessingProcessing/ServingUser Experience

7. Implementation of an Underlying Storage Engine
Table DesignTable SchemaTable ParametersImplementationData conversionGenerate Test DataCreate Avro SchemaImplement MapReduce TransformationHFile ValidationBulk LoadingData ValidationTable SizeFile ContentData IndexingData RetrievalGoing Further
8. Use Case: Near Real-Time Event Processing
Ingest/Pre-ProcessingNear Real-Time Event ProcessingProcessing/Serving
9. Implementation of Near Real-Time Event Processing
Application FlowKafkaFlumeHBaseLilySolrImplementationData GenerationKafkaFlumeSerializerHBaseLilySolrTestingGoing Further
10. Use Case: HBase as a Master Data Management Tool
IngestProcessing
11. Implementation of HBase as a Master Data Management Tool
MapReduce Versus SparkGet Spark Interacting with HBaseRun Spark over an HBase TableCalling HBase from SparkImplementing Spark with HBaseSpark and HBase: PutsSpark on HBase: Bulk LoadSpark Over HBaseGoing Further
12. Use Case: Document Store
ServingIngestClean Up
13. Implementation of Document Store
MOBsStorageUsageToo BigConsistencyGoing Further
III. Troubleshooting
14. Too Many Regions
ConsequencesCausesMisconfigurationMisoperationSolutionBefore 0.98Starting with 0.98PreventionRegions SizeKey and Table Design
15. Too Many Column Families
ConsequencesMemoryCompactionsSplitCauses, Solution, and PreventionDelete a Column FamilyMerge a Column FamilySeparate a Column Family into a New Table
16. Hotspotting
ConsequencesCausesMonotonically Incrementing KeysPoorly Distributed KeysSmall Reference TablesApplications IssuesMeta Region HotspottingPrevention and Solution
17. Timeouts and Garbage Collection
ConsequencesCausesStorage FailurePower-Saving FeaturesNetwork FailureSolutionsPreventionReduce Heap SizeOff-Heap BlockCacheUsing the G1GC AlgorithmConfigure Swappiness to 0 or 1Disable Environment-Friendly FeaturesHardware Duplication
18. HBCK and Inconsistencies
HBase Filesystem LayoutReading METAReading HBase on HDFSGeneral HBCK OverviewUsing HBCK
Index

Content preview from Architecting HBase Applications

Chapter 4. HBase Sizing and Tuning Overview

The two most important aspects of building an HBase appplication are sizing and schema design. This chapter will focus on the sizing considerations to take into account when building an application. We will discuss schema design in Part II.

In addition to negatively impacting performance, sizing an HBase cluster incorrectly will reduce stability. Many clusters that are undersized tend to suffer from client timeouts, RegionServer failures, and longer recovery times. Meanwhile, a properly sized and tuned HBase cluster will perform better and meet SLAs on a consistent level because the internals will have less fluctuation, which in turn means fewer compactions (major and minor), fewer region splits, and less block cache churn.

Sizing an HBase cluster is a fine art that requires an understanding of the application needs prior to deploying. You will want to make sure to understand both the read and the write access patterns before attempting to size the cluster. Because it involves taking numerous aspects into consideration, proper HBase sizing can be challenging. Before beginning cluster sizing, it’s important to analyze the requirements for the project. These requirements should be broken down into three categories:

Workload: This requires understanding general concurrency, usage patterns, and ingress/egress workloads.
Service-level agreements (SLAs): You should have an SLA that guarantees fully quantified read and write latencies, and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491915806Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Architecting HBase Applications

by Jean-Marc Spaggiari, Kevin O'Dell

Chapter 4. HBase Sizing and Tuning Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.