Chapter 9. Distributed Search Execution

Before moving on, we are going to take a detour and talk about how search is executed in a distributed environment. It is a bit more complicated than the basic create-read-update-delete (CRUD) requests that we discussed in Chapter 4.

A CRUD operation deals with a single document that has a unique combination of _index, _type, and routing values (which defaults to the document’s _id). This means that we know exactly which shard in the cluster holds that document.

Search requires a more complicated execution model because we don’t know which documents will match the query: they could be on any shard in the cluster. A search request has to consult a copy of every shard in the index or indices we’re interested in to see if they have any matching documents.

But finding all matching documents is only half the story. Results from multiple shards must be combined into a single sorted list before the search API can return a “page” of results. For this reason, search is executed in a two-phase process called query then fetch.

Query Phase

During the initial query phase, the query is broadcast to a shard ...

Get Elasticsearch: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Elasticsearch: The Definitive Guide by

Chapter 9. Distributed Search Execution

Query Phase

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly