Troubleshooting EIGRP

EIGRP can be difficult to troubleshoot because of its complexity. As a reminder, the best preparation for troubleshooting a network is to be familiar with the network and its state during normal (trouble-free) conditions. Become familiar with the routing tables, their sizes, the summarization points, routing timers, etc. Also, plan ahead with “what-if” scenarios. What if router X failed or link Y dropped? How would connectivity recover? Will all the routes still be in every router’s table? Will the routes still be summarized?

Perhaps the second-best preparation for troubleshooting a network is the ability to track network implementations and changes. If network implementations/changes are made in a haphazard way with no central control, an implementation team may walk away from a change (unaware that their change caused an outage) and it may take the troubleshooting team hours, or even days, to unravel the events that led to the outage. Besides making the network more vulnerable, such loose methods of network operation create bad relationships between teams.

The following sections are a partial list of network states/conditions to check when looking for clues to routing problems in EIGRP.

Verifying Neighbor Relationships

If a router is unable to establish a stable relationship with its neighbors, it cannot exchange routes with those neighbors. The neighbor table can help check the integrity of neighbor relationships. Here is a sample of NewYork’s neighbor table:

NewYork#sh ip eigrp neighbor
IP-EIGRP neighbors for process 10
H   Address                 Interface   Hold Uptime   SRTT   RTO  Q  Seq
                                        (sec)         (ms)       Cnt Num
1   172.16.251.2            Se0/1         10 00:17:08   28  2604  0  7
0   172.16.250.2            Se0/0         13 00:24:43   12  2604  0  14

First, check that the neighbor count matches the number of EIGRP speakers. If routers A, B, and C share an Ethernet segment and run EIGRP 10, all four routers should see each other in their neighbor tables. If router C is consistently missing from A and B’s tables, there may be a physical problem with C or C may be misconfigured (check C’s IP address and EIGRP configuration). Next, look for one-way neighbor relationships. Is C in A and B’s tables, but are A and B not in C’s table? This could indicate a physical problem with C’s connection or a filter that is blocking EIGRP packets.

If the hold-time exceeds 15 seconds (or the configured hold-time), the network may be congested and losing hellos. Increasing the hello-interval/hold-time may be a quick fix to the problem.

The uptime should reflect the duration that the routers have been up. A low uptime indicates that the neighbor relationship is being lost and reestablished.

The QCnt should be (or at least should not exceed on a consistent basis).

In summary, if a problem is found in the neighbor relationship, you should do the following:

  1. Check for bad physical infrastructure.

  2. Ensure that router ports are plugged into the correct hubs.

  3. Check for filters blocking EIGRP packets.

  4. Verify router configurations -- check IP addresses, masks, EIGRP AS numbers, and the network numbers defined under EIGRP.

  5. Increase the hello-interval/hold-time on congested networks.

The command to clear and reestablish neighbor relationships is:

clear ip eigrp neighbors [ip address | interface]

Tip

Repeatedly clearing all neighbor relationships causes the loss of routes (and the loss of packets to those routes). Besides, repeatedly issuing clear commands usually does not fix the problem.

Stuck-in-Active

A route is regarded as stuck-in-active (SIA) when DUAL does not receive a response to a query from a neighbor for three minutes, which is the default value of the active timer. DUAL then deletes all routes from that neighbor, acting as if the neighbor had responded with an unreachable message for all routes.

Routers propagate queries through the network if feasible successors are not found, so it can be difficult to catch the culprit router (i.e., the router that is not responding to the query in time). The culprit may be running high on CPU utilization or may be connected via low-bandwidth links. Going back to TraderMary’s network, when NewYork queries Ames for 172.16.50.0, it marks the route as active and lists the neighbor from which it is expecting a reply (line 28):

   NewYork#sh ip eigrp topology
   IP-EIGRP Topology Table for process 10

   Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply,
       r - Reply status

   ...
   A 172.16.50.0/24, 0 successors, FD is 2195456, Q
       1 replies, active 00:00:06, query-origin: Local origin
       Remaining replies:
28          via 172.16.251.2, r, Serial1 

If this route were to become SIA, the network engineer should trace the path of the queries to see which router has been queried, has no outstanding queries itself, and yet is taking a long time to answer.

Starting from NewYork, the next router to check for SIA routes would be 172.16.251.2 (line 28). Finding the culprit router in large networks is a difficult task, because queries fan out to a large number of routers. Checking the router logs would give a clue as to which router(s) had the SIA condition.

Increase active timer

Another option is to increase the active timer. The default value of the active timer is three minutes. If you think the SIA condition is occurring because the network diameter is too large, with several slow-speed links (such as Frame Relay PVCs), it is possible that increasing the active timer will allow enough time for responses to return. The following command shows how to increase the active timer:

router eigrp 10
timers active-time minutes

For the change to be effective, the active timer must be modified on every router in the path of the query.

EIGRP Bandwidth on Low-Speed Links

EIGRP limits itself to using no more than 50% of the configured bandwidth on router interfaces. There are two reasons for this:

  1. Generating more traffic than the interface can handle would cause drops, thereby impairing EIGRP performance.

  2. Generating a lot of EIGRP traffic would result in little bandwidth remaining for user data.

EIGRP uses the bandwidth that is configured on an interface to decide how much EIGRP traffic to generate. If the bandwidth configured on an interface does not match the physical bandwidth (the network architect may have put in an artificially low or high bandwidth value to influence routing decisions), EIGRP may be generating too little or too much traffic. In either case, EIGRP can encounter problems as a result of this. If it is difficult to change the bandwidth command on an interface because of such constraints, allocate a higher or lower percentage to EIGRP with the following command in interface configuration mode:

ip bandwidth percent eigrp AS-number percentage

Network Logs

Check the output of the show logging command for EIGRP/DUAL messages. For example, the following message:

%DUAL-3-SIA: Route XXX stuck-in-active state in IP-EIGRP

indicates that the route XXX was SIA.

IOS Version Check, Bug Lists

The EIGRP implementation was enhanced in IOS Releases 10.3(11), 11.0(8), and 11.1(3) with respect to its performance on Frame Relay and other low-speed networks. In the event of chronic network problems, check the IOS versions in use in your network. Also use the bug navigation tools available on the Cisco web site.

Debug Commands

As always, use debug commands in a production network only after careful thought. Having to resort to rebooting the router can be very unappetizing. The following is a list of EIGRP debug commands:

  • debug eigrp neighbors (for neighbor-relationship activity)

  • debug eigrp packet (all EIGRP packets)

  • debug eigrp ip neighbor (if the previous two commands are used together, only EIGRP packets for the specified neighbor are shown)

Get IP Routing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.