In order to prevent cascading failures there should be mechanisms in place such as fail fast circuit breakers. Management of retries for failed requests should consider things like:
How long should we wait to retry?
Should we monitor the endpoint and wait for it to get back online and then try again?
When do we notify devops about the failure?
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more.