Chapter 9. Advanced Topics – Multilanguage, Deduplication, and Others

In the previous chapter, we saw how we can use Solr to retrieve documents that are indexed into it in real time. In this chapter, we'll cover some advanced topics that will help us use the full potential of Solr.

Specifically, we'll cover the following topics in this chapter:

  • Indexing a document in multiple languages
  • Detecting duplicate documents (deduplication)
  • Streaming of documents in Solr (content streaming)
  • UIMA integration with Solr

Multilanguage indexing

Solr provides us with a way to index multilanguage documents in it. In this section, we'll cover how to easily index multilanguage documents in Solr and also how to auto-detect a document language.

Let's create a new core called ...

Get Apache Solr for Indexing Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.