206 Implementing InfiniBand on IBM System p
A common problem that can occur in a shared adapter configuration is that it is
possible to run out of resources. Careful planning must be used when
implementing shared adapters. It is very important to know the nature of the jobs
that you will be running on a shared adapter. If any application is being used that
can generate tasks for the HCA, such as a HPC type job, then you should either
not share the adapter or know exactly how many tasks the job you are creating
In an HPC environment, applications like Parallel Environment will generate
numerous tasks, which in turn consumes queue pairs. There are few indicators if
you are having this problem. The only error messages that you will get are going
to be from the application trying to create the next queue pair. This application
will deliver a message that says something like open queue pair failed. If you
see this message on a machine that is using a shared adapter, and you are
seeing degraded performance on an application, then the system administrator
should review the HCA priority settings and raise the priority level of the HCA
allocated to this partition.
6.2.2 Logs available for troubleshooting
When working on InfiniBand related problems, several files can be gathered for
further problem determination. These logs are:
/var/log/messages from a Linux partition.
snap.pax.Z data from an AIX partition (will be generated with the snap -gc
ibnm snap generated on HMC (refer to “HMC/IBM Network Manager
troubleshooting” on page 206 for details).
iqyylog from HMC can be gathered like this: On the HMC GUI, select Service
Applications → Service Focal Point → Manage Serviceable Events →
ALL and then press OK. Select one SFP log entry, click Selected, click
Manage Problem Data, select iqyylog.log, and then either select Call
Home or Save to DVD.
6.2.3 HMC/IBM Network Manager troubleshooting
An extensive guide to troubleshooting using IBM Network Manager is available
The process for gathering error logs on IBM Network Manager can be found in
“Gathering the IBNM snap data” on page 68.