Chapter 8. Searching and Indexing

In this chapter, we will cover the following recipes:

Generating an inverted index using Hadoop MapReduce
Intradomain web crawling using Apache Nutch
Indexing and searching web documents using Apache Solr
Configuring Apache HBase as the backend data store for Apache Nutch
Whole web crawling with Apache Nutch using a Hadoop/HBase cluster
Elasticsearch for indexing and searching
Generating the in-links graph for crawled web pages

Introduction

MapReduce frameworks are well suited for large-scale search and indexing applications. In fact, Google came up with the original MapReduce framework specifically to facilitate the various operations involved with web searching. The Apache Hadoop project was also started as a subproject ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Chapter 8. Searching and Indexing

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly