Chapter 11. Planning for Failure
Enterprise software must be cynical. Cynical software expects bad things to happen and is never surprised when they do. Cynical software doesn’t even trust itself, so it puts up internal barriers to protect itself from failures. It refuses to get too intimate with other systems, because it could get hurt.
Michael T. Nygard, Release It!, 2nd Edition (Pragmatic Bookshelf)
Plan for failure. This chapter will provide you the basics of planning for operational failures, and the rest of the book will help you avoid and detect such anomalies, but remember this: failures are not anomalies. An anomaly is an occurrence that deviates from standard behavior, but you can reduce failures by planning for them as standard operating behavior. This is how you build a system that outlives the time you spend building it.
Introduction: Understand It, Even if You Don’t Manage It
It is not your responsibility to run these managed services, but you need to understand how they work. Just because you can copy and paste some code into the dashboard of your cloud provider, doesn’t mean you don’t need to think about Unix file permissions. Upload a deployment package to AWS Lambda that is not world readable, and it won’t run. Compile a dependency on your macOS machine and get a nice surprising error when it doesn’t run on the serverless platform. “But I don’t compile dependencies for my dynamically run language!” Try to manipulate an image, use cryptography, or connect to ...