O'Reilly logo

Apache Solr Enterprise Search Server - Third Edition by Matt Mitchell, Kranti Parisa, Eric Pugh, David Smiley

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The DataImportHandler framework

Solr includes a very popular contrib module for importing data known as the DataImportHandler. It's a data processing pipeline built specifically for Solr. Here's a list of the notable capabilities:

  • It imports data from databases through JDBC (Java Database Connectivity). This supports importing only changed records, assuming a last-updated date
  • It imports data from a URL (HTTP GET)
  • It imports data from files (that is, it crawls files)
  • It imports e-mail from an IMAP server, including attachments
  • It supports combining data from different sources
  • It extracts text and metadata from rich document formats
  • It applies XSLT transformations and XPath extraction on XML data
  • It includes a diagnostic/development tool

Furthermore, you ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required