Tokenizer
An analyzer has exactly one tokenizer. The responsibility of a tokenizer is to receive a stream of characters and generate a stream of tokens. These tokens are used to build the inverted index. A token is roughly equivalent to a word. In addition to breaking down characters into words or tokens, it also produces in its output the start and end offset of each token in the input stream.
Elasticsearch ships with a number of tokenizers that can be used to compose a custom analyzer; these tokenizers are also used by Elasticsearch itself to compose its built-in analyzers.
You can find list of available built-in tokenizers here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.
The Standard Tokenizer ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access