Skip to Content
Cloud Native Go
book

Cloud Native Go

by Matthew A. Titmus
April 2021
Intermediate to advanced
433 pages
10h 45m
English
O'Reilly Media, Inc.
Content preview from Cloud Native Go

Chapter 9. Resilience

A distributed system is one in which the failure of a computer you didn’t even know about can render your own computer unusable.1

Leslie Lamport, DEC SRC Bulletin Board (May 1987)

Late one September night, at just after two in the morning, a portion of Amazon’s internal network quietly stopped working.2 This event was brief, and not particularly interesting, except that it happened to affect a sizable number of the servers that supported the DynamoDB service.

Most days, this wouldn’t be such a big deal. Any affected servers would just try to reconnect to the cluster by retrieving their membership data from a dedicated metadata service. If that failed, they would temporarily take themselves offline and try again.

But this time, when the network was restored, a small army of storage servers simultaneously requested their membership data from the metadata service, overwhelming it so that requests—even ones from previously unaffected servers—started to time out. Storage servers dutifully responded to the timeouts by taking themselves offline and retrying (again), further stressing the metadata service, causing even more servers to go offline, and so on. Within minutes, the outage had spread to the entire cluster. The service was effectively down, taking a number of dependent services down with it.

To make matters worse, the sheer volume of retry attempts—a “retry storm”—put such a burden on the metadata service that it even became entirely unresponsive to requests ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Cloud Native

Cloud Native

Boris Scholl, Trent Swanson, Peter Jausovec
Cloud Native Patterns

Cloud Native Patterns

Cornelia Davis
Docker in Action, Second Edition

Docker in Action, Second Edition

Stephen Kuenzli, Jeffrey Nickoloff

Publisher Resources

ISBN: 9781492076322Errata Page