Advanced topics with Solr
We have dealt with various data and their types. Most of the cases in enterprise search can be addressed by the different techniques we have gone through. In this section, we will go through some advanced topics for analyzing your data with Solr. We will also try to explore integration with NLP tools to make the incoming data more sensible and effective.
Deduplication in Apache Solr is all about avoiding duplicate documents from entering in the storage of Apache Solr. Apache Solr prevents these duplicates at the document as well as the field level. This is a new feature of Apache Solr 4.x release. The duplicates in the storage can be avoided by means of hashing techniques. Apache Solr supports native de-duplication ...