Chapter 2. Event management categories and best practices 65
The same reasons apply here as for unhandled problems since the increased
severity again implies less time to resolve the underlying issue (see
“Unhandled events” on page 60).
In addition, if a monitored variable reported in the event is governed by SLAs,
notify those responsible for the SLAs when the reported value is about to or
has caused a violation.
򐂰 Do not de-escalate for lessened conditions.
Sometimes the term
de-escalation is used to denote lowering the severity of
an event. The new severity can indicate a lessened or resolved problem.
De-escalate events only when they are resolved. There are several reasons
for this. For example, you do not want to inform someone late at night about a
critical problem only to give them a warning. Also, a problem may oscillate
between two severities. The event processors incur unnecessary overhead by
repeatedly changing event severity.
Most monitoring agents do not send events to indicate a lessened severity.
They normally inform as a problem increases in severity, and re-arm only
when the condition is resolved.
2.7.2 Implementation considerations
Escalation can be performed automatically by a capable event processor or
manually by console operators. Therefore, consider the first best practice in the
list that follows.
Any monitoring agent or tool capable of the function can escalate a problem. The
best place depends on both the tools used and the type of escalation. Consider
the last two best practices in the list that follows.
For implementation, consider the following best practices:
򐂰 Automate escalation whenever possible.
When escalation is automated for unhandled problems, it occurs as soon as
an acceptable, predefined time interval has expired. Similarly, a worsening
condition and business impact escalation, when automated, occur
immediately upon receipt of the relevant events. Operators perform escalation
less precisely, only when they notice that a condition requires it.
If you do not have a well-defined escalation process or it is too complicated to
escalate using your toolset, allow operators to do it. Holding them
accountable for ensuring the timely handling of problems gives them incentive
them to perform the required escalation.
򐂰 Use the trouble-ticketing system to escalate problems that do not receive
timely action.

Get Event Management and Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.