O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 8. Searching and Indexing

In this chapter, we will cover the following recipes:

  • Generating an inverted index using Hadoop MapReduce
  • Intradomain web crawling using Apache Nutch
  • Indexing and searching web documents using Apache Solr
  • Configuring Apache HBase as the backend data store for Apache Nutch
  • Whole web crawling with Apache Nutch using a Hadoop/HBase cluster
  • Elasticsearch for indexing and searching
  • Generating the in-links graph for crawled web pages

Introduction

MapReduce frameworks are well suited for large-scale search and indexing applications. In fact, Google came up with the original MapReduce framework specifically to facilitate the various operations involved with web searching. The Apache Hadoop project was also started as a subproject ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required