Chapter 8. Resilience
This chapter focuses on application resilience, which is the ability to survive situations that might otherwise lead to failure. Unlike other chapters that focused on services external to the Node.js process, this one mostly looks within the process.
Applications should be resilient to certain types of failure. For example, there are many options available to a downstream service like web-api when it is unable to communicate with an upstream service like recipe-api. Perhaps it should retry the outgoing request, or maybe it should respond to the incoming request with an error. But in any case, crashing isnât the best option. Similarly, if a connection to a stateful database is lost, the application should probably try to reconnect to it, while replying to incoming requests with an error. On the other hand, if a connection to a caching service is dropped, then the best action might be to reply to the client as usual, albeit in a slower, âdegradedâ manner.
In many cases it is necessary for an application to crash. If a failure occurs that an engineer doesnât anticipateâoften global to the process and not associated with a single requestâthen the application can potentially enter a compromised state. In these situations itâs best to log the stack trace, leaving evidence behind for an engineer, and then exit. Due to the ephemeral nature of applications, itâs important that they remain statelessâdoing so allows future instances to pick up where ...
Get Distributed Systems with Node.js now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.