O'Reilly logo
live online training icon Live Online training

Resilience and Fast Reroute in Computer Networks: Tools and Techniques to Optimize Network Performance

Tools and Techniques to Optimize Network Performance

Russ White

From an application’s perspective, convergence events in a routing protocol show up as jitter followed by changed delay. Further, link and node failures in a network with high speed links can drop large amounts of traffic, impacting application performance. To resolve these problems, packet switched networks are designed to be resilient. There are, however, many different kinds of solutions, each of which is applicable to a range of problems, but none of which solves the entire set of resilience problems.

This training will provide an overview of many different solutions in the resilience space, including redundancy, BFD, graceful restart, IP based local fast reroute, MPLS based fast reroute, PIC, and others. The positive and negative aspects of each solution will be considered, including the complexity tradeoffs, how these solutions can be combined.

This training will also cover the concepts of MTBF, MTTR, and MTBM in order to provide the background for resilience.

What you'll learn-and how you can apply it

By the end of this course, you will have a solid understanding of resilience in packet switched networks using distributed control planes, including IS-IS and BGP. You will also have a good sense of where to use different tools, and which tools do, or do not, work well together.

This training course is for you because...

  • You want to understand resilience in packet switched networks from a system, rather than solution-by-solution, perspective
  • You want to know how each method for providing resilience in a packet switched network works in depth
  • You want to understand the tradeoffs for solutions designed to provide resilience in packet switched networks
  • You want to understand how solutions designed to provide resilience in packet switched networks interact with mean time between failure, mean time to repair, and other measures of network availability


  • Basic understanding of the principles of routing
  • A basic idea of the various protocols used in an IPv4/IPv6 network

About your instructor

  • Russ White began working with computers in the mid-1980's and computer networks in 1990. He has co-authored forty-seven software patents, participated in the development of several Internet standards, helped develop the CCDE and the CCAr, and worked in Internet governance with the Internet Society. Russ is a co-host of the History of Networking and Hedge podcasts, serves on the Routing Area Directorate at the IETF, co-chairs the BABEL working group, and serves on the Technical Services Council/as a maintainer on the open source FR Routing project. Russ holds an MSIT from Capella University, an MACM from Shepherds Theological Seminary, and is a PhD Candidate in philosophy at SEBTS.


The timeframes are only estimates and may vary according to how the class is progressing

Segment 1: Introduction to Resilience in Packet Switched Networks Length: 50 minutes

  • Understanding, measuring, and calculating mean time between failures
  • The four-step process of network convergence
  • Where and how network convergence can be faster (based on the four-step process)
  • Redundancy as a solution
  • Analyzing tradeoffs in redundancy

10 Minute Break

Segment 2: Local Resilience Options Length: 50 minutes

  • BFD and fast hello processes
  • BGP fast fallover
  • Tuning flooding and SPF convergence time
  • Exponential backoff
  • Graceful restart

Segment 3: Fast Reroute Solutions Length: 50 minutes

  • Understanding and calculating loop-free alternates
  • Remote loop-free alternates
  • MPLS/TE fast reroute
  • Segment Routing fast reroute options

Q&A: 10 minutes