O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Configuring Apache HBase as the backend data store for Apache Nutch

Apache Nutch integrates Apache Gora to add support for different backend data stores. In this recipe, we are going to configure Apache HBase as the backend data storage for Apache Nutch. Similarly, it is possible to plug in data stores such as RDBMS databases, Cassandra, and others through Gora.

This recipe builds upon the instructions given at http://wiki.apache.org/nutch/Nutch2Tutorial.

Note

As of Apache Nutch 2.2.1 release, the Nutch project has not officially migrated to Hadoop 2.x and still depends on Hadoop 1.x for the whole web crawling. However, it is possible to execute the Nutch jobs using a Hadoop 2.x cluster utilizing the backward compatibility nature of Hadoop.

Nutch ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required