O'Reilly logo

Scaling Apache Solr by Hrishikesh Vijay Karambelkar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Advanced topics with Solr

We have dealt with various data and their types. Most of the cases in enterprise search can be addressed by the different techniques we have gone through. In this section, we will go through some advanced topics for analyzing your data with Solr. We will also try to explore integration with NLP tools to make the incoming data more sensible and effective.

Deduplication

Deduplication in Apache Solr is all about avoiding duplicate documents from entering in the storage of Apache Solr. Apache Solr prevents these duplicates at the document as well as the field level. This is a new feature of Apache Solr 4.x release. The duplicates in the storage can be avoided by means of hashing techniques. Apache Solr supports native de-duplication ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required