Chapter 9. Advanced Topics – Multilanguage, Deduplication, and Others

In the previous chapter, we saw how we can use Solr to retrieve documents that are indexed into it in real time. In this chapter, we'll cover some advanced topics that will help us use the full potential of Solr.

Specifically, we'll cover the following topics in this chapter:

  • Indexing a document in multiple languages
  • Detecting duplicate documents (deduplication)
  • Streaming of documents in Solr (content streaming)
  • UIMA integration with Solr

Multilanguage indexing

Solr provides us with a way to index multilanguage documents in it. In this section, we'll cover how to easily index multilanguage documents in Solr and also how to auto-detect a document language.

Let's create a new core called ...

Get Apache Solr for Indexing Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.