July 2019
Intermediate to advanced
502 pages
14h
English
Embracing failure means recognizing that components will fail all the time in a large system. This is not an unusual situation. You want to minimize component failures because each failure has various costs, even if the system as a whole continues to work. But it will happen. Most component failures can be handled either automatically or without urgency by having redundancy in place. However, systems evolve all the time and most systems are not in the perfect position where every component failure has mitigation in place for each type of failure. As a result, theoretically preventable component failures might become system failures. For example, if you write your logs to a local disk and you don't rotate your log ...