Skip to Content
Elasticsearch: The Definitive Guide
book

Elasticsearch: The Definitive Guide

by Clinton Gormley, Zachary Tong
January 2015
Intermediate to advanced
724 pages
13h 21m
English
O'Reilly Media, Inc.
Content preview from Elasticsearch: The Definitive Guide

Chapter 34. Controlling Memory Use and Latency

Fielddata

Aggregations work via a data structure known as fielddata (briefly introduced in “Fielddata”). Fielddata is often the largest consumer of memory in an Elasticsearch cluster, so it is important to understand how it works.

Tip

Fielddata can be loaded on the fly into memory, or built at index time and stored on disk. Later, we will talk about on-disk fielddata in “Doc Values”. For now we will focus on in-memory fielddata, as it is currently the default mode of operation in Elasticsearch. This may well change in a future version.

Fielddata exists because inverted indices are efficient only for certain operations. The inverted index excels at finding documents that contain a term. It does not perform well in the opposite direction: determining which terms exist in a single document. Aggregations need this secondary access pattern.

Consider the following inverted index:

Term      Doc_1   Doc_2   Doc_3
------------------------------------
brown   |   X   |   X   |
dog     |   X   |       |   X
dogs    |       |   X   |   X
fox     |   X   |       |   X
foxes   |       |   X   |
in      |       |   X   |
jumped  |   X   |       |   X
lazy    |   X   |   X   |
leap    |       |   X   |
over    |   X   |   X   |   X
quick   |   X   |   X   |   X
summer  |       |   X   |
the     |   X   |       |   X
------------------------------------

If we want to compile a complete list of terms in any document that mentions brown, we might build a query like so:

GET /my_index/_search
{
  "query" : {
    "match" : {
      "body" : "brown"
    }
  },
  "aggs" : {
    "popular_terms": {
      "terms" : {
        "field" : "body"
      }
    }
  }
}

The query portion is easy ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Elasticsearch in Action

Elasticsearch in Action

Roy Russo, Matthew Lee Hinman, Radu Gheorghe
MongoDB: The Definitive Guide, 3rd Edition

MongoDB: The Definitive Guide, 3rd Edition

Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

Publisher Resources

ISBN: 9781449358532Errata