Appendix C. Useful data import configurations

As discussed in chapter 12, the Data Import Handler provides the ability for Solr to pull in datasets from many kinds of external sources. In chapter 10, we used the DIH to transform Wikipedia pages from a partial Wikipedia data dump file into Solr documents and index them. This appendix will provide more detail into how the DIH was configured to enable this import, and we’ll demonstrate how to import both the full Wikipedia dataset and also another large dataset useful for experimentation: a data dump from Stack Exchange.

C.1. Indexing Wikipedia

In chapter 12, we imported a subset of articles from Wikipedia into a preconfigured Solr core named solrpedia. In order to enable the DIH, several steps ...

Get Solr in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.