book

Latency

Name: Latency
Author: Pekka Enberg
ISBN: 9781633438088

by Pekka Enberg

October 2025

Intermediate to advanced

264 pages

8h 30m

English

Manning Publications

Read now

Unlock full access

Latency
copyright
contents
preface
acknowledgments
about this book
about the author
about the cover illustration
Part 1 Basics
1 Introduction
1.1 What is latency?1.2 How is latency measured?1.3 Why does latency matter?1.3.1 User experience1.3.2 Real-time systems1.3.3 Efficiency1.4 What latency is not1.5 Latency vs. bandwidth1.6 Latency vs. energy

2 Modeling and measuring latency
2.1 Laws of latency2.1.1 Little’s law2.1.2 Amdahl’s law2.2 Latency distribution2.3 Common sources of latency2.3.1 Physics2.3.2 CPU and hardware2.3.3 Virtualization2.3.4 Operating system, drivers, and firmware2.3.5 Managed runtime2.3.6 Application2.4 Compounding latency2.5 Measuring latency2.6 Putting it together: Measuring network latency2.6.1 Plotting with histograms2.6.2 Plotting with eCDF
Part 2 Data
3 Colocation
3.1 Why colocate?3.2 Internode latency3.2.1 Geographical and last-mile latency3.2.2 Edge computing and CDNs3.3 Intranode latency3.3.1 Network stack3.3.2 TCP/IP protocol3.3.3 Kernel-bypass networking3.4 Multicore architecture3.5 Putting it together: REST API with embedded database
4 Replication
4.1 Why replicate data?4.2 Availability and scalability4.3 Consistency model4.3.1 Strong consistency4.3.2 Eventual consistency4.3.3 Other consistency models4.4 Replication strategies4.4.1 Single-leader replication4.4.2 Multi-leader replication4.4.3 Leaderless replication4.4.4 Read-your-writes property4.4.5 Local-first approach4.5 Asynchronous vs. synchronous replication4.6 State machine replication4.7 Case study: Viewstamped Replication4.8 Putting it together: Replicating a key–value store
5 Partitioning
5.1 Why partition data?5.2 Physical partitioning strategies5.2.1 Horizontal partitioning5.2.2 Vertical partitioning5.2.3 Hybrid partitioning5.3 Logical partitioning strategies5.3.1 Functional partitioning5.3.2 Geographical partitioning5.3.3 User-based partitioning5.3.4 Time-based partitioning5.3.5 Overpartitioning5.4 Request routing5.4.1 Direct routing5.4.2 Proxy routing5.4.3 Forward routing5.5 Partition imbalance5.5.1 Hot partitions5.5.2 Skewed workloads5.6 Putting it together: Horizontal partitioning with SQLite
6 Caching
6.1 Why cache data?6.2 Caching overview6.3 Caching strategies6.3.1 Cache-aside caching6.3.2 Read-through caching6.3.3 Write-through caching6.3.4 Write-behind caching6.3.5 Client-side caching6.3.6 Distributed caching6.4 Cache coherency6.5 Cache hit ratio6.6 Cache replacement6.6.1 Least recently used (LRU)6.6.2 Least frequently used (LFU)6.6.3 First-in, first-out (FIFO) and SIEVE6.7 Time-to-live (TTL)6.8 Materialized views6.9 Memoization6.10 Putting it together: In-application caching with Moka
Part 3 Compute
7 Eliminating work
7.1 Ways of eliminating work7.2 Algorithmic complexity7.3 Serializing and deserializing7.4 Memory management7.4.1 Dynamic memory allocation7.4.2 Garbage collection7.4.3 Virtual and physical memory7.4.4 Demand paging7.4.5 Memory topology7.5 Operating system overhead7.5.1 Scheduling delay and context switching7.5.2 Background tasks and interrupts7.5.3 Network stack7.6 Precomputation7.7 Putting it together: Benchmarking with Criterion
8 Wait-free synchronization
8.1 Mutual exclusion8.1.1 Mutexes8.1.2 Read–write locks8.1.3 Spinlocks8.2 Problems with mutual exclusion8.2.1 Inefficiency8.2.2 Priority inversion8.2.3 Convoying8.2.4 Deadlocks8.3 Atomics8.3.1 Atomic operations8.3.2 Anatomy of a spinlock8.4 Memory barriers8.4.1 Types of memory barriers8.4.2 Compiler barriers8.4.3 Memory reordering example8.5 Wait-free synchronization8.5.1 Progress conditions8.5.2 Consensus number8.5.3 Wait-free queues8.5.4 Wait-free stacks8.5.5 Wait-free linked lists8.6 Putting it together: Building a single-producer, single-consumer queue
9 Exploiting concurrency
9.1 Concurrency and parallelism9.2 Concurrency models9.2.1 Threads9.2.2 Fibers9.2.3 Coroutines9.2.4 Event-driven concurrency9.2.5 Futures and promises9.2.6 Actor model9.3 Parallel processing9.3.1 Data parallelism9.3.2 Task parallelism9.4 Transactions9.4.1 Serializability9.4.2 Snapshot isolation9.4.3 Data anomalies and weaker isolation9.5 Concurrency control9.5.1 Two-phase locking9.5.2 Multiversion concurrency control9.6 Putting it together: Sequential vs. concurrent execution
Part 4 Hiding latency
10 Asynchronous processing
10.1 Fundamentals10.1.1 Asynchronous vs. synchronous processing10.1.2 The event loop10.1.3 Challenges10.2 Asynchronous I/O10.2.1 I/O multiplexing10.2.2 Request batching10.2.3 Request hedging10.2.4 Buffered I/O10.2.5 Memory mapping10.3 Deferring work10.3.1 Task scheduling10.3.2 Priority queues10.3.3 Work stealing10.4 Resource management10.4.1 Thread pools10.4.2 Memory pools10.4.3 Connection pools10.5 Managing concurrency with backpressure10.5.1 Controlling the producer10.5.2 Buffering10.5.3 Dropping and rate limiting10.6 Error handling10.6.1 Partial errors10.6.2 Recovery10.6.3 Timeouts and cancellation10.7 Observability10.7.1 Tracing10.7.2 Metrics
11 Predictive techniques
11.1 Introduction to predictive techniques11.2 Prefetching11.2.1 Pattern-based prefetching11.2.2 Semantic prefetching11.3 Optimistic updates11.3.1 Optimistic view11.3.2 Synchronizing optimistic updates11.3.3 Consistency guarantees11.3.4 Error handling and rollbacks11.4 Speculative execution11.4.1 Incremental computation11.4.2 Parallel speculation11.4.3 Value prediction11.5 Predictive resource allocation11.5.1 Overprovisioning11.5.2 Prewarming
appendix Further reading

Content preview from Latency

8 Wait-free synchronization

This chapter covers

Understanding synchronization and mutual exclusion
Working with atomics and memory barriers
Building your own wait-free data structures

In the previous chapter, we explored typical sources of redundant work and strategies to eliminate them, thereby reducing latency. However, optimizing the use of a single CPU will only sometimes suffice to meet stringent latency requirements. In such cases, using the parallelism offered by multiple CPUs becomes crucial. If your application allows for data partitioning—a technique discussed in chapter 5 that involves dividing data into independent chunks—you can scale your performance by adding more CPUs. This approach can significantly optimize latency in ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781633438088Publisher Support Publisher Website Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Latency

by Pekka Enberg