Skip to Main Content
HBase Administration Cookbook
book

HBase Administration Cookbook

by Yifeng Jiang
August 2012
Intermediate to advanced content levelIntermediate to advanced
332 pages
7h 3m
English
Packt Publishing
Content preview from HBase Administration Cookbook

Full shutdown backup using distcp

distcp (distributed copy) is a tool provided by Hadoop for copying a large dataset on the same, or different HDFS cluster. It uses MapReduce to copy files in parallel, handle error and recovery, and report the job status.

As HBase stores all its files, including system files on HDFS, we can simply use distcp to copy the HBase directory to either another directory on the same HDFS, or to a different HDFS, for backing up the source HBase cluster.

Note that this is a full shutdown backup solution. The distcp tool works because the HBase cluster is shut down (or all tables are disabled) and there are no edits to files during the process. Do not use distcp on a live HBase cluster. Therefore, this solution is for the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

HBase High Performance Cookbook

HBase High Performance Cookbook

Ruchir Choudhry
Hbase Essentials

Hbase Essentials

Nishant Garg
Learning Hbase

Learning Hbase

Shashwat Shriparv

Publisher Resources

ISBN: 9781849517140Other