book

Designing Distributed Systems, 2nd Edition

Name: Designing Distributed Systems, 2nd Edition
Author: Brendan Burns
ISBN: 9781098156350

by Brendan Burns

December 2024

Intermediate to advanced

220 pages

5h 59m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who Should Read This BookWhy I Wrote This BookThe World of Distributed Systems TodayNavigating This BookConventions Used in This BookOnline ResourcesUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Foundational Concepts
1. Introduction
A Brief History of Systems DevelopmentA Brief History of Patterns in Software DevelopmentFormalization of Algorithmic ProgrammingPatterns for Object-Oriented ProgrammingThe Rise of Open Source SoftwareThe Value of Patterns, Practices, and ComponentsStanding on the Shoulders of GiantsA Shared Language for Discussing Our PracticeShared Components for Easy ReuseSummary
2. Important Distributed System Concepts
APIs and RPCsLatencyReliabilityPercentilesIdempotencyDelivery SemanticsRelational IntegrityData ConsistencyOrchestration and KubernetesHealth ChecksSummary
II. Single-Node Patterns
3. The Sidecar Pattern
An Example Sidecar: Adding HTTPS to a Legacy ServiceDynamic Configuration with SidecarsModular Application ContainersHands On: Deploying the topz ContainerBuilding a Simple PaaS with SidecarsDesigning Sidecars for Modularity and ReusabilityParameterized ContainersDefine Each Container’s APIDocumenting Your ContainersSummary
4. Ambassadors
Using an Ambassador to Shard a ServiceHands On: Implementing a Sharded RedisUsing an Ambassador for Service BrokeringUsing an Ambassador to Do Experimentation or Request SplittingHands On: Implementing 10% ExperimentsSummary
5. Adapters
MonitoringHands On: Using Prometheus for MonitoringLoggingHands On: Normalizing Different Logging Formats with fluentdAdding a Health MonitorHands On: Adding Rich Health Monitoring for MySQLSummary
III. Serving Patterns
6. Replicated Load-Balanced Services
Stateless ServicesReadiness Probes for Load BalancingHands On: Creating a Replicated Service in KubernetesSession Tracked ServicesApplication-Layer Replicated ServicesIntroducing a Caching LayerDeploying Your CacheHands On: Deploying the Caching LayerExpanding the Caching LayerRate Limiting and Denial-of-Service DefenseSSL TerminationHands On: Deploying nginx and SSL TerminationSummary

7. Sharded Services
Sharded CachingWhy You Might Need a Sharded CacheThe Role of the Cache in System PerformanceReplicated Sharded CachesHands On: Deploying an Ambassador and Memcache for a Sharded CacheAn Examination of Sharding FunctionsSelecting a KeyConsistent Hashing FunctionsHands On: Building a Consistent HTTP Sharding ProxySharded Replicated ServingHot Sharding SystemsSummary
8. Scatter/Gather
Scatter/Gather with Root DistributionHands On: Distributed Document SearchScatter/Gather with Leaf ShardingHands On: Sharded Document SearchChoosing the Right Number of LeavesScaling Scatter/Gather for Reliability and ScaleSummary
9. Functions and Event-Driven Processing
Determining When FaaS Makes SenseThe Benefits of FaaSThe Challenges of FaaSThe Need for Background ProcessingThe Need to Hold Data in MemoryThe Costs of Sustained Request-Based ProcessingPatterns for FaaSThe Decorator Pattern: Request or Response TransformationHands On: Adding Request Defaulting Prior to Request ProcessingHandling EventsHands On: Implementing Two-Factor AuthenticationEvent-Based PipelinesHands On: Implementing a Pipeline for New User SignupSummary
10. Ownership Election
Determining If You Even Need Leader ElectionThe Basics of Leader ElectionHands On: Deploying etcdImplementing LocksHands On: Implementing Locks in etcdImplementing OwnershipHands On: Implementing Leases in etcdHandling Concurrent Data ManipulationSummary
IV. Batch Computational Patterns
11. Work Queue Systems
A Generic Work Queue SystemThe Source Container InterfaceWork Queue APIThe Worker Container InterfaceThe Shared Work Queue InfrastructureHands On: Implementing a Video ThumbnailerDynamic Scaling of the WorkersThe Multiworker PatternSummary
12. Event-Driven Batch Processing
Patterns of Event-Driven ProcessingCopierFilterSplitterSharderMergerHands On: Building an Event-Driven Flow for New User SignupPublisher/Subscriber InfrastructureHands On: Deploying KafkaResiliency and Performance in Work QueuesWork StealingErrors, Priority, and RetrySummary
13. Coordinated Batch Processing
Join (or Barrier Synchronization)ReduceHands On: CountSumHistogramHands On: An Image Tagging and Processing PipelineSummary
V. Universal Concepts
14. Monitoring and Observability Patterns
Monitoring and Observability BasicsLoggingMetricsBasic Request MonitoringAdvanced Request MonitoringAlertingTracingAggregating InformationSummary
15. AI Inference and Serving
The Basics of AI SystemsHosting a ModelDistributing a ModelDevelopment with ModelsRetrieval-Augmented GenerationTesting and DeploymentSummary
16. Common Failure Patterns
The Thundering HerdThe Absence of Errors Is an Error“Client” and “Expected” ErrorsVersioning ErrorsThe Myth of Optional ComponentsOops, We “Cleaned Up” EverythingChallenges with the Breadth of InputsProcessing Obsolete WorkThe “Second System” ProblemSummary
Conclusion: A New Beginning?
Index
About the Author

Content preview from Designing Distributed Systems, 2nd Edition

Chapter 15. AI Inference and Serving

In the last few years, AI has become a key part of many different types of applications. Though the basics of neural networks and machine learning have been around for decades, in the last decade advances in deep learning and large language models (LLMs) have created a phase shift in the quality of models and the applications that are possible for AI. More crucially, these systems have captured the imagination of application developers all over the world who see limitless ways to apply LLMs to their particular business domains.

AI and machine learning is a complex topic that can take years to master, but fortunately, with the assistance of libraries and pre-built models, it takes significantly less time to begin to incorporate intelligence into your application. This chapter does not attempt to make you an AI expert, but it can serve as an introduction to the concepts and approaches for using AI in your system.

The Basics of AI Systems

Before we get started on the details of using AI in your system, it is useful to get a grounding in the core concepts that make up AI application. The place that most people start is with a model. A model is a collection of numeric weights that encode the knowledge in a neural network. In modern LLMs, there are trillions of these weights. As a rough definition, you can think of the model as a function that takes a collection of inputs and transforms them into some output.

Unlike a traditional function in a programming ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098156343Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Designing Distributed Systems, 2nd Edition

by Brendan Burns

Chapter 15. AI Inference and Serving

The Basics of AI Systems

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.