Chapter 21. Egress Service Fast Restoration

During failure of a primary egress PE, preinstallation of the next hop associated with the backup egress PE reduces the failover time from seconds to a few hundred milliseconds. BGP convergence is no longer the contributing factor, because the second BGP next hop is preinstalled in the FIB.

However, IGP convergence still contributes to the overall failover time, because the ingress PE must discover failure of the primary egress PE to remove the associated next hop from the FIB. To reduce the detection time to less than a few hundred milliseconds (IGP convergence), you could deploy next-hop tracking or BGP session liveness detection mechanisms (using, for example, multihop Bidirectional Forwarding Detection [BFD]) with very aggressive timers. Very aggressive timers on multihop BFD sessions are, however, a questionable solution from a deployment (scaling) perspective, especially in large-scale networks, where a large number of such BFD sessions would be required.

So, what can you do? The answer is to move the duty of fixing the problem from the ingress PE (which is potentially far away from egress PE) to the network node closest to the egress PE. If the network node (let’s call it Point of Local Repair [PLR]) directly connects to the egress PE, a failure of the egress PE can be discovered very quickly, without the need for IGP convergence. Upon failure of the primary egress PE, the PLR node redirects the traffic. Therefore, traffic is ...

Get MPLS in the SDN Era now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.