O'Reilly logo

Apache Solr for Indexing Data by Anshul Johri, Sachin Handiekar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Advanced Topics – Multilanguage, Deduplication, and Others

In the previous chapter, we saw how we can use Solr to retrieve documents that are indexed into it in real time. In this chapter, we'll cover some advanced topics that will help us use the full potential of Solr.

Specifically, we'll cover the following topics in this chapter:

  • Indexing a document in multiple languages
  • Detecting duplicate documents (deduplication)
  • Streaming of documents in Solr (content streaming)
  • UIMA integration with Solr

Multilanguage indexing

Solr provides us with a way to index multilanguage documents in it. In this section, we'll cover how to easily index multilanguage documents in Solr and also how to auto-detect a document language.

Let's create a new core called ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required