Chapter 11. Inside a Shard
In Chapter 2, we introduced the shard, and described it as a low-level worker unit. But what exactly is a shard and how does it work? In this chapter, we answer these questions:
-
Why is search near real-time?
-
Why are document CRUD (create-read-update-delete) operations real-time?
-
How does Elasticsearch ensure that the changes you make are durable, that they won’t be lost if there is a power failure?
-
Why does deleting documents not free up space immediately?
-
What do the
refresh
,flush
, andoptimize
APIs do, and when should you use them?
The easiest way to understand how a shard functions today is to start with a history lesson. We will look at the problems that needed to be solved in order to provide a distributed durable data store with near real-time search and analytics.
Making Text Searchable
The first challenge that had to be solved was how to make text searchable. Traditional databases store a single value per field, but this is insufficient for full-text search. Every word in a text field needs to be searchable, which means that the database needs to be able to index multiple values—words, ...
Get Elasticsearch: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.