O'Reilly logo

Lucene 4 Cookbook by Vineeth Mohan, Edwood Ng

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Obtaining a common analyzer

Lucene provides a set of default analyzers in the lucene-analyzers-common package. Let's take a look at them in detail.

Getting ready

The following are five common analyzers Lucene provides in the lucene-analyzers-common module:

  • WhitespaceAnalyzer: Splits text at whitespaces, just as the name indicates. In fact, this is the only thing this analyzer does.
  • SimpleAnalyzer: Splits text at non-letter characters and lowercases resulting tokens.
  • StopAnalyzer: Splits text at non-letter characters, lowercases resulting tokens, and removes stopwords. This analyzer is useful for pure text content and is not ideal if the content contains words with special characters such as product model number. This analyzer comes with a default set ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required