O'Reilly logo

Instant Apache Solr for Indexing Data How-to by Alexandre Rafalovitch

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Pulling data from XML with DataImportHandler (Intermediate)

There are two ways to get structured data into Solr: push and pull. We have already explored pushing the data; let's have a look at pulling the data.

Data Import Handler (DIH) is a specialized handler that is able to connect to multiple types of external data sources and then import, normalize, map, and post-process data to make it ready for Solr. It handles both full and incremental (delta) import.

DIH—and Solr itself—can scale enough to import large datasets, such as English Wikipedia (about 20 GB of structured text at the last recorded attempt). However, most of the time DIH is used for small to medium size imports. It is often a reasonable way to get an initial dataset into Solr for ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required