7 Text analysis

This chapter covers

  • Overview of text analysis
  • Anatomy of an analyzer
  • Built-in analyzers
  • Developing custom analyzers
  • Understanding tokenizers
  • Learning about character and token filters

Elasticsearch does a lot of ground (and grunt) work behind the scenes on incoming textual data. It preps data to make it efficiently stored and searchable. In a nutshell, Elasticsearch cleans text fields, breaks text data into individual tokens, and enriches the tokens before storing them in inverted indexes. When a search query is carried out, the query string is searched against the stored tokens, and any matches are retrieved and scored. This process of breaking the text into individual tokens and storing it in internal memory structures is ...

Get Elasticsearch in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.