218 Event Management and Best Practices
If we define an event as four occurrences and one hole when we configure our
profile, the event is generated when the fourth indication occurs. If we have two
consecutive holes, then the occurrence count is reset to zero and a clearing
event is sent if it is configured to send a clearing event.
Figure 6-12 IBM Tivoli Monitoring holes versus occurrences chart
6.3 Correlation
This section focuses on performing correlation with IBM Tivoli NetView, IBM
Tivoli Switch Analyzer, IBM Tivoli Enterprise Console, and IBM Tivoli Monitoring.
6.3.1 Correlation with NetView and IBM Tivoli Switch Analyzer
We now focus on the available methods of correlation with NetView and IBM
Tivoli Switch Analyzer.
Router Fault Isolation (RFI)
NetView performs status polling for the machines it is managing using either
Internet Control Message Protocol (ICMP) pings or SNMP queries. The intervals
at which it polls is set for individual or groups of devices or by default in the
/usr/OV/conf/ovsnmp.conf file. Traditionally, if NetView did not receive a response
to its status poll, it marked the node or router down and issued a down trap.
Threshold
Metric Values
Indication #1
2 Consecutive Holes
Clearing Event
Time
Cycle Time
Max number of holes
is 1 and they are not
consecutive
Indication #2 Indication #3 Indication #4
Chapter 6. Event management products and best practices 219
Often during a network failure, the path from the NetView server to portions of
the network is broken. Prior to router fault isolation, NetView attempted to poll the
devices in the unreachable part of the network and generated down traps when
they did not answer. This resulted in many segment, node, and interface down
traps, particularly in networks with a large number of nodes on the far sides of
routers. When the failure was corrected, NetView generated numerous up traps
for each device it could again successfully reach.
This plethora of events had several drawbacks:
򐂰 Increased the difficulty of determining the original cause of the network failure
򐂰 Slowed network traffic considerably with the large number of status polls to
the occluded area
򐂰 Created performance problems and unreliable status reports if the events
were forwarded to the IBM Tivoli Enterprise Console and IBM Tivoli
Enterprise Data Warehouse
RFI overview
The RFI function rectifies these problems. When NetView detects a node or
interface is down, RFI first checks the status and accessibility of the router
interfaces connected to the subnet on which the node or interface resides. During
the router check, each interface and its subnet are analyzed. An unresponsive
interface triggers checks of the interface and any connecting routers.
RFI generates appropriate Router Down or Router Marginal traps for conditions
detected. It also simplifies the notification action by issuing one summary alert
identifying the router nearest the fault.
When active, the Router Fault Isolation feature generates the events shown in
Table 6-4 to alert users to important status changes.
Table 6-4 Router fault isolation events
Event Network status
Router Marginal At least one router interface is down. At least one other
interface on that router is up.
Router Down All interfaces are not responding, but at least one connected
subnet is reachable. (The router is not in an occluded region.)
Router Unreachable The network management workstation cannot query the
router because it is an occluded region.
Router Up All the interfaces have responded successfully. This event is
issued on initial discovery and following a recovery from one
or more interfaces being down.

Get Event Management and Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.