Advanced topics with Solr

We have dealt with various data and their types. Most of the cases in enterprise search can be addressed by the different techniques we have gone through. In this section, we will go through some advanced topics for analyzing your data with Solr. We will also try to explore integration with NLP tools to make the incoming data more sensible and effective.

Deduplication

Deduplication in Apache Solr is all about avoiding duplicate documents from entering in the storage of Apache Solr. Apache Solr prevents these duplicates at the document as well as the field level. This is a new feature of Apache Solr 4.x release. The duplicates in the storage can be avoided by means of hashing techniques. Apache Solr supports native de-duplication ...

Get Scaling Apache Solr now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.