Under the Hood of the Routing Engine

To utilize the robust high availability toolkit provided in JUNOS, one must fully understand the software components of the RE and how they work together to build a highly available operating system . As we discussed in Chapter 3, JUNOS provides a clear separation between the forwarding and control planes. This separation creates an environment in which the router can still forward the traffic when its control plane is down. As long as the traffic is actually flowing through the router, users do not experience any network-related issues.

The RE is the brain that stores the building blocks of system availability, providing all the necessary tools for routing protocols and route calculations. The main function of the RE is to perform route management, using a vastly modified Unix Routing Protocol Daemon (RPD). Because route management is a complex function, the RPD divides its work into many tasks and runs its own scheduler to prioritize them, ensuring that each protocol and route calculation receives the appropriate resources to perform its job.

The primary goal of the RPD is to create and maintain the Routing Information Base (RIB), which is a database of routing entries. Each routing entry consists of a destination address and some form of next hop information. RPD maintains the routing table and properly distributes routes from the routing table into the kernel and the hardware complexes used for traffic forwarding.

While almost all network equipment vendors use the concept of a RIB for Border Gateway Protocol (BGP), JUNOS uses a RIB-based structure for all of its routing tables. To understand routing for high availability in your network, it is important to know the table names and to understand the role of each table. Table 4-1 describes the JUNOS routing tables.

Table 4-1. Routing tables implemented in JUNOS

Routing table name	Description
`bgp.isovpn.0`	BGP reachability information for International Organization for Standardization (ISO) VPNs.
`bgp.l2vpn.0`	BGP Layer 2 VPN routes.
`bgp.l3vpn.0`	BGP Layer 3 VPN routes.
`bgp.rtarget.0`	BGP route target information.
`inet.0`	IP version 4 (IPv4) unicast routes.
`inet.1`	IP multicast routes. Contains an entry for each (S,G) pair in the network.
`inet.2`	IPv4 unicast routes. Used by IP multicast-enabled routing protocols to perform Reverse Path Forwarding (RPF).
`inet.3`	Accessed by BGP to use Multiprotocol Label Switching (MPLS) paths for forwarding traffic.
`inet.4`	Routes learned by Multicast Source Discovery Protocol (MSDP).
`inet6.0`	IP version 6 (IPv6) unicast routes.
`inet6.3`	Populated when the `resolve-vpn` statement is enabled to allow a router whose VPN control plane is undergoing a restart to continue to forward traffic while recovering its state from neighboring routers.
`inetflow.0`	BGP flow destination (firewall match criteria) information.
`invpnflow.0`	BGP flow destination (firewall match criteria) information within an RFC 2547 Layer 3 VPN.
`iso.0`	Intermediate System to Intermediate System (IS-IS) and End System to Intermediate System (ES-IS) routes.
`l2circuit.0`	Layer 2 circuit routes.
`mpls.0`	MPLS label-switched paths (LSPs). Contains a list of the next label-switched router in each LSP. Used by transit routers to forward packets to the next router along an LSP.
`<instance-name>``.inet.0`	Table that JUNOS software creates each time you configure an IPv4 unicast routing instance.
`<instance-name>``.inet.3`	Table that JUNOS software creates for each BGP instance that is configured to use MPLS paths for forwarding traffic.
`<instance-name>``.inet6.0`	Table that JUNOS software creates each time you configure an IPv6 unicast routing instance.
`<instance-name>``.inetflow.0`	Table that JUNOS software creates each time you configure a routing instance. This table stores dynamic filtering information for BGP.
`<instance-name>``.iso.0`	Table that JUNOS software creates each time you configure an IS-IS or ES-IS instance.
`<instance-name>``.mpls.0`	Table that JUNOS software creates each time you configure MPLS LSPs.

The RPD stores routes in these tables and moves routes among the tables as needed. For example, when the router receives routing information from a routing protocol in the form of newly advertised routes, such as a BGP update message, the routing update is stored in the table called RIB-IN. The RPD runs BGP import policies and the BGP best route selection algorithm on the received routes to create an ordered set of usable routes. The final results of the route selection process are stored in the routing main JUNOS RIB, inet.0. As BGP prepares to advertise BGP routes to its peers, the export policy is run against the routes and the results are moved into the outgoing table, RIB-OUT.

The RPD stores routes for BGP-based Layer 3 VPNs in the table bgp.l3vpn.0, which is populated by Multiprotocol BGP (MP-BGP). As JUNOS software runs the configured policies against the information in the table, all acceptable routes are sent to one or more routing-instance tables while any routing information that is unacceptable to the policies is marked as hidden. After the RPD route selection process is finalized, the RPD daemon copies the selected routes into the kernel’s copy of the routing table using IPC messages. JUNOS does not rely on BSD’s default routing socket for IPC; instead, it uses a specialized socket that allows any daemon within the box to communicate with the kernel. For example, the RPD uses routing sockets to signal the addition, deletion, or change of routes to the kernel. Similarly, the dcd daemon, responsible for interface management, also communicates with the kernel using the same routing socket type when it signals an interface addition, deletion, or change of its status. And again, the chassisd daemon updates the kernel with any new or changed hardware status using the same routing socket type.

The protocol used for this IPC is Trivial Network Protocol (TNP). TNP is a Layer 3 protocol (like an IP) and uses Ethernet II encapsulation. Like any Layer 3 protocol, it uses source and destination addresses, and it forms and maintains neighbor relationships using Hello messages. The TNP Hello message contains Hello timers and dead intervals, which are used to discover the failure of other hardware components (REs, Packet Forwarding Engines or PFEs, and Flexible PIC Concentrators or FPCs). While you cannot configure most of the TNP Hello timers, you can configure the Hello and dead time intervals between two REs through the command-line interface (CLI) keepalive command.

Note

Although the netstat commands do work in JUNOS, because JUNOS uses a raw socket type, the IPC is not visible with the plain Unix netstat–a command. Communication is formed using rtsock messages and can be viewed using the rtsockmon command.

Continuing our look at the RE’s processes, the next step is that the kernel copies the RPD’s forwarding table to the PFE. The table structure is modified so that it can be stored in the PFE’s application-specific integrated circuits (ASICs), which make forwarding decisions. These decisions are based on proprietary Radix tree route lookups (called J-Tree route lookups) that are performed on each ASIC. As inet.0 and subsequent forwarding table routes are modified, the RE kernel incrementally updates the copy stored in the PFE that is used for forwarding. Because the microkernel of each ASIC contains all routes and their respective next hops, the actual forwarding process continues even when the primary RE is brought down, as shown in Figure 4-1. The fact that forwarding can continue while the control plane is unavailable—such as during an RE switchover—is important for understanding high availability solutions.

Figure 4-1. ASICs storing the forwarding table routes

Routing Update Process

To understand how you can use the routing features of JUNOS high availability in your network, it is best to actually visualize the routing update process that we discussed in the previous section. The following steps and code snippets explain the control plane and forwarding plane interactions, as they also serve as an excellent troubleshooting tool for diagnosing issues that might occur during network degradation.

Step 1: Verify that the RE and PFEs are up

During normal operation, one of the REs should be online and labeled as master. The master RE should have open, active connections with the rest of the hardware components. The following commands verify the state of the hardware, as seen by the RE.

lab@r1> show chassis routing-engine 
Routing Engine status:
  Slot 0:
    Current state            Master
    Election priority        Master (default)
    ......
    Uptime                   2 days, 1 hour, 55 minutes, 53 seconds
    Load averages:           1 minute   5 minute  15 minute
                                 0.00       0.00       0.00
Routing Engine status:
  Slot 1:
    Current state            Backup
    Election priority        Backup (default)
    ......
    Start time               2008-09-17 05:59:19 UTC
    Uptime                   17 hours, 53 minutes, 38 seconds


lab@r1> show chassis fpc 
               Temp  CPU Utilization (%)   Memory    Utilization (%)
Slot State      (C)  Total  Interrupt      DRAM (MB) Heap     Buffer
  0  Online      21      7          0       1024       24         31
  1  Online      21      4          0       1024       24         31
  2  Online      22      5          0       1024       18         31
  3  Online      22      5          0       1024       17         31
  4  Empty     
  5  Empty

Step 2: Verify that the socket is built

All hardware components should now be online, and one of the REs should be listed in the Master state. If any hardware component is not online, you can start troubleshooting by examining the IPC performed by TNP between different hardware components. Specifically, look at the connections between the RE and the PFE and make sure you see an RPD OPEN state for each online PFE component:

lab@r1>start shell
lab@r1% netstat -a -f tnp
Active TNP connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
<...>
rdp        0      0  master.pfed            feb0.46081             OPEN
rdp        0      0  master.chassisd        feb0.46080             OPEN
rdp        0      0  master.pfed            fpc2.24577             OPEN
rdp        0      0  master.chassisd        fpc2.24576             OPEN
udp        0      0  *.sampled              *.*
udp        0      0  *.sampled              *.*
rdp        0      0  *.1013                 *.*                    LISTEN
rdp        0      0  *.chassisd             *.*                    LISTEN

Step 3: Verify that there is a valid TNP communication

If you see state issues in step 2, research them further by monitoring the internal management interface:

lab@r1> monitor traffic interface em1
verbose output suppressed, use <detail> or <extensive> for full protocol decode
Listening on em1, capture size 96 bytes

02:51:40.239754 Out TNPv2 master.1021 > re1.1021: UDP, length 8
02:51:40.397159  In TNPv2 re1.1021 > re0.1021: UDP, length 8
02:51:41.249676 Out TNPv2 master.1021 > re1.1021: UDP, length 8
02:51:41.407092  In TNPv2 re1.1021 > re0.1021: UDP, length 8
02:51:42.259578 Out TNPv2 master.1021 > re1.1021: UDP, length 8
02:51:42.416900  In TNPv2 re1.1021 > re0.1021: UDP, length 8
02:51:43.269506 Out TNPv2 master.1021 > re1.1021: UDP, length 8
02:51:43.426834  In TNPv2 re1.1021 > re0.1021: UDP, length 8

Step 4: Verify that BGP adjacencies are established

Once the RE is online, if you configured a BGP neighbor, you next verify the state of the BGP adjacency:

lab@r1> show bgp summary
Groups: 1 Peers: 1 Down peers: 0
Table      Tot Paths  Act Paths Suppressed  History Damp State Pending
inet.0            28         28          0        0          0       0
bgp.l3vpn.0       13         13          0        0          0       0
bgp.mvpn.0         5          5          0        0          0       0
Peer           AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Activ
e/Received/Damped...
69.191.3.199    33181     68         24       0       0    8:21 Establ
  inet.0: 28/28/0
  bgp.l3vpn.0: 13/13/0
  bgp.mvpn.0: 5/5/0
  vpn.inet.0: 13/13/0
  vpn.mvpn.0: 5/5/0

Step 5: Verify that BGP updates are being received

Next, verify that the route updates are being received. The following command peeks into the BGP RIB-IN table:

lab@r1> show route receive-protocol bgp 69.191.3.199

inet.0: 64 destinations, 64 routes (63 active, 0 holddown, 1 hidden)
  Prefix            Nexthop         MED     Lclpref    AS path
* 3.3.3.3/32        69.191.3.201    2000    100        13908 I
* 4.4.4.4/32        69.191.3.201    2000    100        13908 I
* 69.184.0.64/26    69.191.3.201    2025    100        13908 I
* 69.184.25.64/28   69.191.3.201            100        13908 I
* 69.184.25.80/28   69.191.3.201            100        13908 I
* 69.184.25.96/28   69.191.3.201            100        13908 I
* 101.0.0.0/30      69.191.3.201            100        13908 ?
* 128.23.224.4/30   69.191.3.201            100        13908 ?
... output truncated...

Step 6: Verify that route updates are processed correctly

The following CLI output proves that a route has gone through the BGP selection process and has been marked as active:

lab@r1> show route 3.3.3.3

inet.0: 64 destinations, 64 routes (63 active, 0 holddown, 1 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

3.3.3.3/32    *[BGP/170] 1w6d 21:55:50, MED 2000, localpref 100, from 69.191.3.199
                 AS path: 13908 I
               > to 172.24.160.1 via fe-1/3/1.0

Step 7: Verify that the correct next hop is being selected

The following output gives more details about the actual BGP selection process, including the reasons the route was activated or deactivated, the BGP next hop, the physical next hop, and the state of the route:

lab@r1> show route 3.3.3.3 extensive

inet.0: 64 destinations, 64 routes (63 active, 0 holddown, 1 hidden)
3.3.3.3/32 (1 entry, 1 announced)
TSI:
KRT in-kernel 3.3.3.3/32 -> {indirect(262148)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect
                Next-hop reference count: 75
                Source: 69.191.3.199
                Next hop type: Router, Next hop index: 462
                Next hop: 172.24.160.1 via fe-1/3/1.0, selected
                Protocol next hop: 69.191.3.201
                Indirect next hop: 89bb000 262148
                State: <Active Int Ext>
                Local AS: 33181 Peer AS: 33181
                Age: 1w6d 21:54:56  Metric: 2000 Metric2: 1001
                Announcement bits (2): 0-KRT 7-Resolve tree 2
                Task: BGP_33181.69.191.3.199+179
                AS path: 13908 I (Originator) Cluster list: 69.191.3.199
                AS path: Originator ID: 69.191.3.201
                Communities: 13908:5004
                Localpref: 100
                Router ID: 69.191.3.199
                Indirect next hops: 1
                        Protocol next hop: 69.191.3.201 Metric: 1001
                        Indirect next hop: 89bb000 262148
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 172.24.160.1 via fe-1/3/1.0
                        69.191.3.201/32 Originating RIB: inet.0
                          Metric: 1001                    Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 172.24.160.1 via fe-1/3/1.0

Step 8: Verify that the correct copy of the route is being selected for kernel update

This output shows the kernel copy of the forwarding table. This table, which is present on the RE, is sent to the PFE complex by means of routing update messages:

lab@r1> show route forwarding-table destination 3.3.3.3 extensive
Routing table: inet [Index 0]
Internet:

Destination:  3.3.3.3/32
  Route type: user
  Route reference: 0                   Route interface-index: 0
  Flags: sent to PFE, prefix load balance
  Next-hop type: indirect              Index: 262148   Reference: 26
  Nexthop: 172.24.160.1
  Next-hop type: unicast               Index: 462      Reference: 47
Next-hop interface: fe-1/3/1.0

Step 9: Verify that the correct copy of the route is being sent to the forwarding plane

This step looks at the rtsock messages being used to replicate the kernel table into the PFE complex:

lab@r1> start shell 
% rtsockmon -t
        sender   flag    type       op
[20:07:40] rpd      P    nexthop    add     inet 172.24.160.1 nh=indr 
flags=0x1 idx=262142 ifidx=68 filteridx=0 
[20:07:40] rpd      P    route      add     inet 69.184.0.64 tid=0
 plen=26 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 199.105.185.224 tid=0
 plen=28 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 160.43.3.144 tid=0
 plen=28 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 172.24.231.252 tid=0
 plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 4.4.4.4 tid=0
 plen=32 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 172.24.95.208 tid=0
 plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 172.24.95.204 tid=0
 plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 172.24.231.248 tid=0
 plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    route      add     inet 172.24.95.196 tid=0
 plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd      P    route      add     inet 3.3.3.3 tid=0
 plen=32 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd      P    route      add     inet 160.43.175.0 tid=0
 plen=27 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0 
[20:07:40] rpd      P    nexthop    add     inet 172.24.160.1 nh=ucst 
flags=0x85 idx=494 ifidx=68 filteridx=0

Step 10: Verify that the correct copy of the route is being installed into the forwarding plane on the PFE complex

Using a VTY session to access the PFE complex, you can determine which actual forwarding entries are present in the PFE’s ASICs. The following output shows that the routing entry for destination 3.3.3.3 is present and has a valid next hop:

lab@r1> start shell
% su
Password:
root@r1% vty feb


CSBR platform (266Mhz PPC 603e processor, 128MB memory, 512KB flash)

CSBR0(r1 vty)# show route ip prefix 3.3.3.3 detail
IPv4 Route Table 0, default.0, 0x0:
Destination               NH IP Addr      Type     NH ID Interface
------------------------- --------------- -------- ----- ---------
3.3.3.3                   172.24.160.1    Indirect 262142 fe-1/3/1.0
  RT flags: 0x0010, Ignore: 0x00000000, COS index: 0, DCU id: 0, SCU id: 0
  RPF ifl list id: 0,  RPF tree: 0x00000000
  PDP[0]: 0x00000000

  Second NH[0]: 0x00000000

The exercise in this section verified the state of the control plane and followed the routing update process from the initial establishment of a BGP peering session all the way to installing the routing entry into the forwarding ASIC on the PFE. Now that you understand the control and forwarding plane processes and their interactions, we can discuss the different high availability solutions available through JUNOS software.

Get JUNOS High Availability now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

JUNOS High Availability by James Sonderegger, Kieran Milne, Senad Palislamovic, Orin Blomberg