Chapter 18. Getting Started with Languages

Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box support for many of the world’s most common languages:

Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

These analyzers typically perform four roles:

Tokenize text into individual words:

The quick brown foxes → [The, quick, brown, foxes]
Lowercase tokens:

The → the
Remove common stopwords:

[The, quick, brown, foxes] → [quick, brown, foxes]
Stem tokens to their root form:

foxes → fox

Each analyzer may also apply other transformations specific to its language in order to make words from that language more searchable:

The english analyzer removes the possessive 's:

John's → john
The french analyzer removes elisions like l' and qu' and diacritics like ¨ or ^:

l'église → eglis
The german analyzer normalizes terms, replacing ä and ae with a, or ß with ss, among others:

äußerst → ausserst

Using Language Analyzers

The built-in language analyzers are available globally and don’t need to be configured before being used. They can be specified directly in the field mapping:

PUT /my_index
{
  "mappings": {
    "blog": {
      "properties": {
        "title": {
          "type":     "string",
          "analyzer": "english"

Get Elasticsearch: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Elasticsearch: The Definitive Guide by

Chapter 18. Getting Started with Languages

Using Language Analyzers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly