Chapter 6. IBM System x3850 X5 and x3950 X5 297
IBM XIV Storage System: Architecture, Implementation, and Usage, SG24-7659
This book describes the concepts, architecture, and implementation of the IBM XIV®
Storage System:
http://www.redbooks.ibm.com/abstracts/sg247659.html
IBM Midrange System Storage Hardware Guide, SG24-7676
This book consolidates, in one document, detailed descriptions of the hardware
configurations and options offered as part of the IBM Midrange System Storage servers,
which include the IBM System Storage DS4000 and DS5000 families of products:
http://www.redbooks.ibm.com/abstracts/sg247676.html
For more information regarding HBA storage-specific settings and zoning, contact your SAN
vendor or storage vendor.
6.11 Failure detection and recovery
This section provides an overview of tools available to assist with problem resolution for the
x3850 X5 in any given configuration. It also provides considerations for extended outages.
6.11.1 What happens when a node fails or the MAX5 fails
If you have power problems and one or both nodes fail or the MAX5 is no longer supplied with
power, the complex configuration will shut down to avoid any damage (data loss, corrupt data,
and so on). No OS can handle this sudden change to the system.
The MAX5 is turned off only if the connected server issues a power-off request and you have
disconnected the MAX5 power cord from the power source. You cannot turn off the MAX5
expansion module manually.
For recovery options, see 6.11.4, “Recovery process” on page 299.
6.11.2 Reinserting the QPI wrap cards for extended outages
If one node becomes unavailable for any reason, you have the capability to boot your system
in a single-node configuration. If you have QPI wrap cards, install the QPI cards for your
system. The QPI wrap cards are not mandatory, but they provide a performance boost by
ensuring that all CPUs are only one hop away from each other. For more information about
QPI cards, see 3.4.2, “QPI Wrap Card” on page 66.
For recovery options, see 6.11.4, “Recovery process” on page 299.
6.11.3 Tools to aid hardware troubleshooting for x3850 X5
Use the following tools when troubleshooting problems on the x3850 x5 in any configuration.
Integrated Management Module
The first place to start troubleshooting the x3850 X5 is typically the IMM. Use the links under
the Monitors heading to view the status of the server, as shown in Figure 6-81 on page 298.
298 IBM eX5 Implementation Guide
Figure 6-81 IMM web interface
From the System Status pages, you can perform these tasks:
Monitor the power status of the server and view the state of the OS.
View the server temperature readings, voltage thresholds, and fan speeds.
View the latest server OS failure screen capture.
View the list of users who are logged in to the IMM.
From the Virtual Light Path page, you can view the name, color, and status of any LEDs that
are lit on a server.
From the Event Log page, you can perform these tasks:
View certain events that are recorded in the event log of the IMM.
View the severity of events.
For more information about the IMM, see 9.2, “Integrated Management Module (IMM)” on
page 449.
Light path diagnostics panel
You can use the light path diagnostics to diagnose system errors quickly. When an LED is lit
on the light path diagnostics panel, it helps you to isolate the error. The server is designed so
that LEDs remain lit when the server is connected to an ac power source but is not turned on,
provided that the power supply operates correctly. This feature helps you to isolate the
problem when the OS is shut down.
For more information, see the Problem Determination and Service Guide - IBM System x3850
X5, x3950 X5 (7145, 7146) at the following website:
http://ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5084848
System event log
This log contains POST and system management interrupt (SMI) events and all events that
are generated by the Baseboard Management Controller (BMC) that is embedded in the IMM.
You can view the system event log through the UEFI by pressing F1 at system start-up and
selecting System Event Logs System Event Log.
POST event log
This log contains the three most recent error codes and messages that were generated
during POST. You can view the POST event log through the UEFI by pressing F1 at system
start-up and selecting System Event Logs POST Event Viewer.
IBM Electronic Service Agent
With an appropriate hardware maintenance and warranty contract, Electronic Service
Agent™ enables your system to call home to submit diagnostic information and system
statistics, report a problem, and, if a fix is available, download the solution immediately.
System
Monitors
System Status
Virtual Light Path
Event Log
Vital Product Data
Get IBM eX5 Implementation Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.