We mentioned earlier that all of Apache Lucene's data is stored in an inverted index. This transformation is required for successful response by Elasticsearch to search requests. The process of transforming this data is called analysis.

Elasticsearch has an index analysis module. It maps to the Lucene Analyzer. In general, analyzers are composed of a single Tokenizer and zero or more TokenFilters.


Analysis modules and analyzers will be discussed in depth in Chapter 4, Analysis and Analyzers.

Elasticsearch provides a lot of character filters, tokenizers, and token filters. For example, a character filter may be used to strip out HTML markup and a token filter may be used to modify tokens (for example, lowercase). You can combine them ...

Get Elasticsearch Indexing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.