Configuring Solr with Nutch

Apache Solr can easily be configured for use with Nutch. We can perform the following steps to integrate Apache Nutch with Solr:

  1. Create a new core (nutch-example) in Solr by copying the nutch-example folder from the Chapter 7 code that comes with this book.
  2. After creating the new core, we just need to restart the Solr instance.
  3. After we have restarted the Solr instance, let's crawl some data using Nutch and index it into Solr. To do this, we'll navigate to the %NUTCH_HOME% folder and execute the following command:
    $ bin/crawl
    

    After executing the command, we'll see the following output:

    Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
     -i|--index Indexes crawl results into a configured indexer ...

Get Apache Solr for Indexing Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.