Time for action – refactoring the schema.xml file for the paintings core by introducing tokenization and stop words

We will rewrite the configuration in order to make it adaptable to real-world text, introducing stop words and a common tokenization of words:

  1. Starting from a copy of the schema designed before, we added two new field types in the <types> section:
    <fieldType name="text_general" class="solr.TextField"> <analyzer> <charFilter class="solr.MappingCharFilterFactory"mapping="mapping-ISOLatin1Accent.txt" /> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType> ...

Get Apache Solr Beginner's Guide now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.