Chapter 6. Problem determination 205
To upgrade the firmware of the switch, you will need the following items:
򐂰 A computer that can access the InfiniBand switchs Ethernet management
port
򐂰 The appropriate firmware upgrade files from the switch manufacturer, located
on an FTP or TFTP server
򐂰 The administrator account on the switch
The steps to upgrade the switch firmware are:
1. Log in to the Web Interface on the switch. Make sure your account has
administrator level privileges.
2. Click the Maintenance menu and select File Management. The File
Management window opens.
3. Click the line in the Current Files on System table that represents the file
that you want to install, the click the Install button. A verification window
opens.
4. Once you click Yes in the next step, the switch will reboot. Make sure your
environment is ready for the switch to go down.
5. Click the Yes button to install the image.
The system installs the new image, and reboots the switch for the changes to
take effect.
6.2 System p troubleshooting
For more detailed information about how to troubleshoot hardware and firmware
issues on System p Servers, go to:
http://publib.boulder.ibm.com/infocenter/eserver/v1r3s/topic/ipha5/trou
bleshooting_firm.htm
6.2.1 HCA troubleshooting
For more detailed information about how to resolve issues with your IBM
InfiniBand GX bus adapter, check the following Web page:
http://publib.boulder.ibm.com/infocenter/eserver/v1r3s/topic/iphau/infi
nibandpdf.pdf
Note: Before you install an image make sure that all of the cards on the
chassis have been brought up.
206 Implementing InfiniBand on IBM System p
A common problem that can occur in a shared adapter configuration is that it is
possible to run out of resources. Careful planning must be used when
implementing shared adapters. It is very important to know the nature of the jobs
that you will be running on a shared adapter. If any application is being used that
can generate tasks for the HCA, such as a HPC type job, then you should either
not share the adapter or know exactly how many tasks the job you are creating
will consume.
In an HPC environment, applications like Parallel Environment will generate
numerous tasks, which in turn consumes queue pairs. There are few indicators if
you are having this problem. The only error messages that you will get are going
to be from the application trying to create the next queue pair. This application
will deliver a message that says something like open queue pair failed. If you
see this message on a machine that is using a shared adapter, and you are
seeing degraded performance on an application, then the system administrator
should review the HCA priority settings and raise the priority level of the HCA
allocated to this partition.
6.2.2 Logs available for troubleshooting
When working on InfiniBand related problems, several files can be gathered for
further problem determination. These logs are:
򐂰 /var/log/messages from a Linux partition.
򐂰 snap.pax.Z data from an AIX partition (will be generated with the snap -gc
command).
򐂰 ibnm snap generated on HMC (refer to “HMC/IBM Network Manager
troubleshooting” on page 206 for details).
򐂰 iqyylog from HMC can be gathered like this: On the HMC GUI, select Service
Applications Service Focal Point Manage Serviceable Events
ALL and then press OK. Select one SFP log entry, click Selected, click
Manage Problem Data, select iqyylog.log, and then either select Call
Home or Save to DVD.
6.2.3 HMC/IBM Network Manager troubleshooting
An extensive guide to troubleshooting using IBM Network Manager is available
at:
http://publib.boulder.ibm.com/infocenter/eserver/v1r3s/topic/tpspnelemg
rug.pdf
The process for gathering error logs on IBM Network Manager can be found in
“Gathering the IBNM snap data” on page 68.

Get Implementing InfiniBand on IBM System p now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.