Pulling data from XML with DataImportHandler (Intermediate)

There are two ways to get structured data into Solr: push and pull. We have already explored pushing the data; let's have a look at pulling the data.

Data Import Handler (DIH) is a specialized handler that is able to connect to multiple types of external data sources and then import, normalize, map, and post-process data to make it ready for Solr. It handles both full and incremental (delta) import.

DIH—and Solr itself—can scale enough to import large datasets, such as English Wikipedia (about 20 GB of structured text at the last recorded attempt). However, most of the time DIH is used for small to medium size imports. It is often a reasonable way to get an initial dataset into Solr for ...

Get Instant Apache Solr for Indexing Data How-to now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.