O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Loading large datasets to an Apache HBase data store – importtsv and bulkload

The Apache HBase data store is very useful when storing large-scale data in a semi-structured manner, so that it can be used for further processing using Hadoop MapReduce programs or to provide a random access data storage for client applications. In this recipe, we are going to import a large text dataset to HBase using the importtsv and bulkload tools.

Getting ready

  1. Install and deploy Apache HBase in your Hadoop cluster.
  2. Make sure Python is installed in your Hadoop compute nodes.

How to do it…

The following steps show you how to load the TSV (tab-separated value) converted 20news dataset in to an HBase table:

  1. Follow the Data preprocessing using Hadoop streaming and Python ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required