Chapter 11. Inside a Shard
In Chapter 2, we introduced the shard, and described it as a low-level worker unit. But what exactly is a shard and how does it work? In this chapter, we answer these questions:
Why is search near real-time?
Why are document CRUD (create-read-update-delete) operations real-time?
How does Elasticsearch ensure that the changes you make are durable, that they won’t be lost if there is a power failure?
Why does deleting documents not free up space immediately?
What do the
optimizeAPIs do, and when should you use them?
The easiest way to understand how a shard functions today is to start with a history lesson. We will look at the problems that needed to be solved in order to provide a distributed durable data store with near real-time search and analytics.
Making Text Searchable
The first challenge that had to be solved was how to make text searchable. Traditional databases store a single value per field, but this is insufficient for full-text search. Every word in a text field needs to be searchable, which means that the database needs to be able to index multiple values—words, ...