Chapter 8. Resilience
This chapter focuses on application resilience, which is the ability to survive situations that might otherwise lead to failure. Unlike other chapters that focused on services external to the Node.js process, this one mostly looks within the process.
Applications should be resilient to certain types of failure. For example, there are many options available to a downstream service like web-api when it is unable to communicate with an upstream service like recipe-api. Perhaps it should retry the outgoing request, or maybe it should respond to the incoming request with an error. But in any case, crashing isn’t the best option. Similarly, if a connection to a stateful database is lost, the application should probably try to reconnect to it, while replying to incoming requests with an error. On the other hand, if a connection to a caching service is dropped, then the best action might be to reply to the client as usual, albeit in a slower, “degraded” manner.
In many cases it is necessary for an application to crash. If a failure occurs that an engineer doesn’t anticipate—often global to the process and not associated with a single request—then the application can potentially enter a compromised state. In these situations it’s best to log the stack trace, leaving evidence behind for an engineer, and then exit. Due to the ephemeral nature of applications, it’s important that they remain stateless—doing so allows future instances to pick up where the last one left ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access