Chapter 6. The Write Stuff: Buffering and Flushing
Modern retrieval systems are not just fast readers of data; they are also sophisticated writers. Whether ingesting logs, crawling new content, or capturing ephemeral sensor streams, real-time indexing requires careful trade-offs among performance, durability, and concurrency. Unlike traditional transactional databases, which focus on strict consistency and durability semantics, retrieval engines must optimize for throughput and latency while maintaining sufficient consistency to ensure query results are predictable and correct. This chapter discusses the architectural foundations that enable this, focusing on how memory, buffers, and concurrency shape the path from incoming data to the persistent index. Figure 6-1 illustrates this write path at a high level, showing how incoming data flows through in-memory buffers, flushing boundaries, and multiple durability targets before becoming part of the persistent index.
Figure 6-1. The write path ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access