Graceful Routing Engine Switchover

Graceful Routing Engine Switchover (GRES) takes advantage of the separation between the control and forwarding in JUNOS software to provide system redundancy. GRES allows the control plane, in this case the RE, to switch over to its backup RE without any interruption to the existing traffic flows in the PFE. When GRES is configured on the router, the kernel state on both REs is synchronized to preserve routing and forwarding state on both REs. Any changes to the routing state on the primary RE result in an automatic incremental update of kernel state on the backup RE. As you can see, the GRES concept is very simple.

However, the limitation of GRES is that, by itself, it cannot provide router redundancy. Even though traffic continues to flow through the router during a switchover between REs, the flow occurs for only a limited time. As soon as any of the protocol timers expire, the neighbor relationship between routers is dropped and traffic is stopped at the upstream router. To maintain high availability, networks must quickly discover when a neighbor goes down and must implement the lowest possible timers. Because GRES provides an intact forwarding plane, it is to our advantage not to drop any of the adjacencies, but rather to continue sending traffic toward the failed router.

The solution to this limitation is to use the Graceful Restart (GR) protocol extension. GR signals all supporting protocols that the failing router is capable of forwarding even though it is having control plane problems and needs help maintaining protocol adjacencies. GRES provides zero loss only when supplemented with the GR protocol running between a failed router and all its neighbors. For more about GR, see Graceful Restart.

Note

As we will see later in this chapter, the key factor for using GR is to have a stable topology. Any topology change results in adjacency loss.

Implementation and Configuration

Enabling GRES on the router changes the information flow from the RE to the PFE. By default, without GRES enabled, the RE signals any state change or routing update directly to the PFE. However, when GRES is enabled, these changes must first be duplicated on the backup RE. This action must be taken first to avoid any potential corner cases where the PFE and the backup RE might be out of sync. For example, if the backup RE is updated last and the primary RE crashes after it updates the PFE, but before updating the backup RE, the backup RE and the PFE would be out of sync. The problems that would result would be immense, so the replication order used makes sure that this condition never happens.

Because state replication is a broad term, let’s examine further what exactly is being replicated. From the user perspective, all the interfaces and their states are replicated when the interfaces restart. Additionally, all Layer 2 protocol states are preserved, such as ATM, Frame Relay, and Point to Point Protocol (PPP) states. All routes from the kernel table, as well as Address Resolution Protocol (ARP) entries and firewall states, are preserved as well. However, TCP state and actual RPD routes are not preserved because they are rebuilt either locally by means of a kernel copy or through new neighbor discovery.

From the JUNOS perspective, three states are being replicated. Understanding them will help you to configure and troubleshoot the state replication process used in GRES. The three states are:

Configuration database

The configuration database is the repository of the router’s configuration files. Different daemons query this database as needed. The dcd daemon, which manages interfaces, queries it as an interface is brought online. At the same time, the chassisd daemon uses this database when it manages hardware components. The RPD also uses this database, using the configuration information stored in the database to control all routing protocols. To ensure that this state is preserved and the database is always in sync, you must use the commit synchronize command when committing a change in a redundant RE configuration. JUNOS displays an error prompt when GRES is enabled but you fail to commit configuration changes with a commit synchronize command.

Kernel with all its entries

Configuring GRES starts the ksyncd daemon, which is responsible for all kernel state replications. ksyncd is a custom JUNOS daemon used only for replication tasks between different hardware components. Here, it is used to replicate the kernel state. ksyncd uses regular IPC Rtsock messages to carry information from the kernel on the primary RE to the kernel on the backup RE. When a GRES restart event occurs, the RPD starts up on the backup RE and it reads all saved routes from the kernel and puts them into routing tables as kernel routes (KRT). These routes stay active for a maximum of three minutes. Any routing changes resulting in routing updates, even those as simple as ARP entries, are signaled incrementally to the backup RE by means of ksyncd.

PFE state

PFE state replication is done by chassisd. When a GRES restart event occurs, chassisd does a soft restart of all the hardware, querying it for an inventory. The hardware that responds is reattached to the backup RE and brought online without any disruption. However, if certain hardware fails to respond, it is restarted anyway. To make the replication process as efficient as possible, local user files, accounting information, logs, and traceoptions files are not replicated to the backup RE.

Figure 4-2 illustrates the state replication components and the flow of communication between them.

While the replication of all states is important, let’s look further at the role of kernel state replication, because all routing entries, including ARP-derived next hop addresses, depend on successful replication of the kernel state. Once the user makes a change and implements it with the commit sync command, all the state replication takes place. All routing and next hop entries derived from the RPD and found in routing tables inet.0, inet.6, mpls.0, and inet.3 are replicated from the primary RE’s kernel routing table to the secondary RE’s kernel table. All these routes become “active” routes with duplicate forwarding entries in the PFE. The routes and forwarding entries stay active for about three minutes and then are purged. Once the RPD on the backup RE initializes itself, it populates its routing table with the existing kernel routes. We can then say that the GRES event has successfully completed. From a high availability perspective, the RPD acquires the most up-to-date network state information and refreshes its routing tables. Figure 4-3 illustrates the GRES state replication process.

State replication process

Figure 4-2. State replication process

GRES state replication process

Figure 4-3. GRES state replication process

Configuration examples

GRES is supported only on the routers with two REs running the same JUNOS version. Use the show version command to verify the software version on the RE; by default, when you log in, you are on the master RE:

lab@r1> show version
Hostname: r1
Model: mx480
JUNOS Base OS boot [9.2R2.15]
JUNOS Base OS Software Suite [9.2R2.15]
JUNOS Kernel Software Suite [9.2R2.15]
JUNOS Crypto Software Suite [9.2R2.15]
JUNOS Packet Forwarding Engine Support (M/T Common) [9.2R2.15]
JUNOS Packet Forwarding Engine Support (MX Common) [9.2R2.15]
JUNOS Online Documentation [9.2R2.15]
JUNOS Routing Software Suite [9.2R2.15]

From the master RE, you can log in to the backup RE and verify its software version:

lab@r1> request routing-engine login other-routing-engine

--- JUNOS 9.2R2.15 built 2008-10-03 19:32:58 UTC

lab@r1> show version
Hostname: r1
Model: mx480
JUNOS Base OS boot [9.2R2.15]
JUNOS Base OS Software Suite [9.2R2.15]
JUNOS Kernel Software Suite [9.2R2.15]
JUNOS Crypto Software Suite [9.2R2.15]
JUNOS Packet Forwarding Engine Support (M/T Common) [9.2R2.15]
JUNOS Packet Forwarding Engine Support (MX Common) [9.2R2.15]
JUNOS Online Documentation [9.2R2.15]
JUNOS Routing Software Suite [9.2R2.15]

If the systems are running the same JUNOS version, you can configure GRES using the redundancy statements under the chassis hierarchy:

[edit]
lab@r1# set chassis redundancy graceful-switchover

[edit]
lab@r1# set chassis redundancy failover on-disk-failure

[edit]
lab@r1# set chassis redundancy failover on-loss-of-keepalives

[edit]
lab@r1# set chassis redundancy routing-engine 0 master

[edit]
lab@r1# set chassis redundancy routing-engine 1 backup

[edit]
lab@r1# show chassis redundancy
routing-engine 0 master;
routing-engine 1 backup;
failover {
    on-loss-of-keepalives;
    on-disk-failure;
}
graceful-switchover;

RE failure is one type of failure that GRES monitors. The primary RE sends keepalive messages every second to the backup kernels. If the backup RE does not receive keepalives for two consecutive seconds, it presumes that the primary RE has failed and attempts to acquire mastership by starting the RE failover process. Another type of failure handled by GRES is media corruption. The kernel is intelligent enough to recognize storage media problems, both with the hard drive and with the CompactFlash drive. Because JUNOS software is built on a modified FreeBSD kernel, it requires constant writing to the logfiles. When the kernel senses problems with writing or reading from the hard drive, it registers this as an RE failure and starts up a GRES event.

Note

In newer JUNOS releases, another configuration step is required to enable GRES: configuring the backup router. Although the backup router is not part of the redundancy mechanism, it is required so that the control plane is always reachable when the primary router is loading or recovering a configuration, or when this router is being configured. The backup router serves as a default route for the control plane, even though this route is never installed into the forwarding plane. Here is an example of configuring the backup router:

lab@re1# set system backup-router destination 184.12.1.12

Once GRES is enabled, verify its status by executing the show system switchover command from the backup RE:

{backup}
lab@r1> show system switchover
Graceful switchover: On
Configuration database: Ready
Kernel database: Version incompatible
Peer state: Out of transition

The preceding output shows that the router is running different JUNOS versions on the primary and backup REs. Once you upgrade JUNOS on the backup RE, verify GRES again:

{backup}
lab@r1> show system switchover
Graceful switchover: On
Configuration database: Ready
Kernel database: Ready
Peer state: Steady State

Troubleshooting GRES

As the previous section shows, the steps to implement GRES are simple. However, there wouldn’t be such a high demand for skilled IT employees if everything always worked as configured and expected. Sometimes it will be necessary to troubleshoot how the protocols are working in your network.

As an example, let’s consider a scenario in which the entire configuration is in place and the kernel replication states look fine, but during a lab RE switchover test you observe traffic loss. Because GRES is enabled, no packets should be dropped—at least, that’s what Juniper promised us. Well, let’s analyze the situation and see what is happening.

To troubleshoot the actual problem, first verify the configuration. Enabling traceoptions on all the protocols and GRES knobs helps us get to the root of the problem:

lab@r1# show protocols ospf
traceoptions {
    file ospf.trace;
    flag all detail;
}

[edit]
lab@r1# show protocols bgp
traceoptions {
    file bgp.trace;
    flag all detail;
}

Note

It is not recommended to have flag all turned on under any protocol traceoptions in the production network. Logging all protocol events takes a toll on RPD processing, which could jeopardize real-time protocol managements such as BGP keepalives or OSPF Hellos. flag all was acceptable in this case since it is a lab environment.

Then check the traceoptions output with the show log command. If you see from the logs that the protocols are working correctly and that it was not a protocol error that caused the issue, the next step is to debug the actual GRES communication:

[edit]
lab@r1# show chassis
redundancy {
    routing-engine 0 master;
    routing-engine 1 backup;

    failover {
        on-loss-of-keepalives;
        on-disk-failure;
    }
    graceful-switchover {
        traceoptions {
            flag all;
        }
        enable;
    }
}

[edit]
lab@r1# show routing-options
static {
    route 66.129.243.0/24 next-hop 172.18.66.1;
}
forwarding-table {
    traceoptions {
        flag all detail;
    }
}

When you enable traceoptions in the GRES portion of the chassis redundancy configuration and on the forwarding table, JUNOS software begins placing debugging information about redundancy and the forwarding engine into logfiles, which you can then review.

Because ksyncd is the daemon responsible for kernel route replication, its logs usually reveal a lot of information about state, potential errors, misconfigurations, and software bugs. Check the logs with the following command:

lab@r1> show log ksyncd

The logs themselves contain much information that can point you in the proper direction when troubleshooting. The following code snippet shows log output from ksyncd:

lab@r1> show log ksyncd

Sep 10 23:27:09 Terminated: 15 signal received, posting signal
Sep 10 23:27:09 inspecting pending signals
Sep 10 23:27:09 SIGTERM posted, exiting
Sep 12 17:12:27 KSYNCD release 9.0R3.6 built by builder on 2008-08-01 05:05:43
UTC starting, pid 4564
Sep 12 17:12:27 Not runnable attempt 0 reason: not configured (errors: none)
Sep 12 17:12:27 Setting hw.re.is_slave_peer_gres_ready: 0

Near the end of the following output, an error message points to a configuration error, specifically that the commit sync command is not configured on the master RE:

lab@r1> show log ksyncd

Sep 15 21:57:52 KSYNCD release 9.1R2.10 built by builder on 2008-07-01
05:06:40 UTC starting, pid 4566
Sep 15 21:57:52 Not runnable attempt 0 reason: undefined mode (errors: none)
Sep 15 21:58:01 Terminated: 15 signal received, posting signal
Sep 15 21:58:01 inspecting pending signals
Sep 15 21:58:01 SIGTERM posted, exiting
Sep 18 02:17:16 KSYNCD release 9.1R2.10 built by builder on 2008-07-01 05:06:40 
UTC starting, pid 8998
Sep 18 02:17:16 Commit sync knob is NOT configured on master RE
Sep 18 02:17:16 Stop attempting to perform initial sync
Sep 18 02:17:16 config state: ready

Another problem that occurs quite often is that the REs on the router are running different versions of JUNOS software. In the kysncd logfile, you see this as a version mismatch error. You could easily search for this message when parsing the logfiles:

lab@r1> show log ksyncd

Sep 18 02:17:17 Register timer to wait for version info from master
Sep 18 02:17:17 recv RE msg subtype RE_MSG_RTSOCK_VERSION_REPLY
Sep 18 02:17:17         RE_MSG_RTSOCK_VERSION_REPLY :
Sep 18 02:17:17                     rtm_n_msg_types : 0x00000059
Sep 18 02:17:17                rtm_version_checksum : 0x86496873
Sep 18 02:17:17 Received RTSOCK version message from master
Sep 18 02:17:17 Rtsock version checksum mismatch: master 2252957811, slave 1789492047
Sep 18 02:17:17 Version mismatch detected: Slumber time
Sep 18 02:17:17 Suspending due to unrecoverable error: version_mismatch
Sep 18 02:17:17 Not runnable attempt 0 reason: hard error (errors: version_mismatch )
Sep 18 02:17:17 closing connection to master
Sep 18 02:17:17 cleaning up kernel state
Sep 18 02:17:17 delete all commit proposals seqno 1658:
Sep 18 02:17:17 delete route rtb 0 af 2 rttype perm: skip not supported
Sep 18 02:17:17 delete route rtb 0 af 2 0.0.0.0 rttype perm: skip not supported
Sep 18 02:17:17 delete route rtb 0 af 2 66.129.243.0 rttype user: skip private

Sometimes issues may not be obvious in the ksyncd logs, or the log may not reveal anything specific. A further step in troubleshooting ksyncd is to log in to the backup RE and run the Unix command rtsockmon from the shell to view the actual route replication process:

lab@r1> start shell
% rtsockmon -t
         sender   flag    type        op
[20:07:40] rpd       P    route       add     inet 4.4.4.4 tid=0 
plen=32 type=user flags=0x10nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd       P    route       add     inet 172.24.95.208 tid=0 
plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd       P    route       add     inet 172.24.95.204 tid=0 
plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd       P    route       add     inet 172.24.231.248 tid=0 
plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd       P    route       add     inet 172.24.95.196 tid=0 
plen=30 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0
[20:07:40] rpd       P    route       add     inet 3.3.3.3 tid=0 
plen=32 type=user flags=0x10 nh=indr nhflags=0x4 nhidx=262142 filtidx=0

This output shows that the route replication process seems to be fine. If the configuration and logfiles do not reveal anything out of the ordinary, you next look at the bigger picture, such as networkwide issues. For example, investigate the protocol traceoptions logs to see whether an adjacency dropped during a GRES event on a failed router. If this appears to be the case, analyze the neighbor’s configuration for Graceful Restart extensions. If you see very similar config without a Graceful Restart knob, the neighbor is missing the GR configuration:

lab@r1# show routing-options
static {
    route 66.129.243.0/24 next-hop 172.18.66.1;

As you will see in the next section, the GRES concept alone is not sufficient to keep the network running during RE failures. GRES must be complemented with Graceful Restart.

Get JUNOS High Availability now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.